Quora Q Search for questions, people, and topics 


Alan Kay ” 


21,840 followers - 2 following 


Still trying to learn how to think better 


Profile 620 Answers 0 Questions 0 Posts 21.8K Followers Following Edits Activity 


Profile Most recent Vv 


é Alan Kay 
“7 Have designed a few programming languages - Updated 20h 


How is math translated into code? For example, ray tracing in computer 
graphics, how is the math and physics equation of that translated into a 
code? 


The answer by Jim Lux will help. 


Here's another slant on this. A lot of early computer programming was about taking 
mathematical formulas and writing code to compute them. One of the very first 
“higher level programming languages” (released in 1956) was FORTRAN, whose name 
was derived from FORmula TRANslation. It’s main ability was to allow recognizable 
arithmetic formulas more or less in algebraic form to be written directly in the code. 


(| have a Quora answer somewhere about FORTRAN. Thanks to Dane Clarke, here is 
the link: 


@ Alan Kay - 3y 
What was the historical significance of FORTRAN? 


The historical significance of FORTRAN is primarily due to the level of software engineering i... 


) 


This effort was successful because John Baccus and his IBM team put an enormous 
amount of work into not just doing the translation, but in optimizing the code that 
was produced to compete successfully with hand written code. 


For a time, FORTRAN was actually regarded as “early Al". Most subsequent 
programming languages — Algol, C, C++, Pascal, Python, etc. — have also 
implemented “formula translation” and similar optimizations. 


(Added Nov 24 2023) | realized that the short answer above is a bit cryptic for the level 
of the question. What FORTRAN brought to the table was to have a form in which 
simple algebraic formulas as used on paper could be put on punched cards in almost 
the same form. 


1 AREA = SQRT( S * (S - IA) * (S - IB) * (S - IC) ) 
Note that SQRT(...) is used instead of the radical sign, and that multiplication is specific 
crm), 


At the next level, most computing machines have an ADD instruction and usually a 
SUB tract instruction. Most will also have a MUL instruction (some will have a 
subroutine for this). And most will not have a SQRT instruction, and will do this via a 
subroutine. 


The task of the FORTRAN compiler is to translate the characters in the example line to 
a sequence of machine instructions that will perform the indicated computation. The 
“variables” will be modeled by words in memory (“core” memory in those days). Most 
machines back then didn’t have a stack, so the compiler would also allocate temporary 
variables to hold intermediate results. 


For example: 
LDA S 
SUB IC 


STA TEMP1 


Sign In 


Credentials & Highlights More 


& Curious 

& Studied at University of Colorado Boulder 
© 7.1M content views 72.4K this month 

& Top Writer 2018 and 2017 

@ Active in 4 Spaces 

© Joined December 2015 


Spaces 


@ Alan Kay's Posts 

Admin 

(00 Zeroes & Ones 
Contributor 

&) Code 
Contributor 

9) Future Programming 
Contributor 

Knows about 


Alan Kay 
124 answers 


a Computer Science 


115 answers 


parc PARC (company) 
57 answers 


fq) Music 
ie 
18 answers 


ial Guitars 
17 answers 


View more Vv 


would compute the value of the last subexpression. The other subexpressions would 
be computed the same way and stored in TEMP2 and TEMP3. 


Then the machine code would do the multiplications 
LDA S 
MUL TEMP1 
MUL TEMP2 
MUL TEMP2 


This will be the value of the parameter for the SQRT subroutine, which might be called 
thusly: 


JMPSUB SQRT 
STA AREA 
which will assume its parameter is in the "A" register. 


A convention on this machine might be to leave the result in the "A" register and 
return to the instruction word after the JMPSUB which will store the result in AREA. 


Note that this code could be optimized ... 


Bottom line is that human conventions for “formulas” can be used in a program if 
there is a compiler that can translate the formulas into the machine code of the 
machine being used while preserving the meaning of the original goal. 


A very early famous (and famously simple) system for translating from grammars of 
programming languages to runnable code is “Meta II" by Val Shorre. The wikipedia 
article has an explanation of it and also has links to the original papers (read this one 
(%, it is a classic). 


META II - Wikipedia 


META II is a domain-specific programming language for writing compilers .... 


& https://en.wikipedia.org/wiki/META_II 


€A Alan Kay 


Have designed a few programming languages - Updated Nov 16 


What programming language does Alan Kay consider to be the most 
advanced? 


| used to track programming languages as well as | could (but as Jean Sammet 
pointed out in the late 60s, “There already have been over 3000 PLs designed and 
made. How can we sort them?”). 


The ACM chooses languages for its “History of Programming Languages” (HOPL) 
series, but only from languages that have been in use for about 10 years, have 
affected other languages, etc. There have been four HOPL collections of histories so 
far, all published as books. 


I've given a few talks that have asked about why software and programming 
languages have not even kept pace with exponential Moore's Law HW scalings, and 
consequent changes in requirements (e.g. security, safety, specifications, etc.). 


The main events for me after | turned 80 three years ago, have been trying to avoid 
Covid here in London, and developing a bad cancer about two years ago (“It is what it 
is", no comments needed). 


But I'm pretty sure now that there could be one or two really interesting languages 
out there that I'm not aware of. The last time | really checked was a few years ago 
when | got asked to do 3 different “Software Engineering” talks at almost the same 
time. This got me to scurry around to contact colleagues to find new colleagues 
(especially in the tactical trenches) who would have more accurate perceptions of the 
current situation. 


| found a number of top people | had not been aware of (for example, Gerard 
Holzmann of NASA) who had accomplished really interesting and important things. 
But | did not find anything of great new interest in the programming language arena. 
Most of the top SEs | talked to were doing projects bolted onto quite weak (and old) 
programming systems and requirements (for example NASA requires its SW for e.g. a 
Mars Rover to be done in C (!) despite that it will have to run for a year or more on 
Mars!). The result is that an enormous amount of computer science and engineering 
(and funding) is expended to take what is essentially a house of cards, and make it 
into a “real house”. 


Universities used to be a very good place to birth new programming languages, but 


today's funders seem to have little taste for funding these today — and worse, what | 
see (anecdotal as it may be) are many students getting trained in languages for legacy 
systems in business rather than being taught how to design and build new languages. 


So: 


1. The real question is not “most advanced?”, but “up to today’s actual needs?” 
2. Is there a good list of requirements for “up to today’s needs?” 


3. Are there even any first-try example languages that are aimed at “up to 
today’s needs?” 


| find it hard to imagine that any reasonable candidate these days would not include 
larger facilities (such as active knowledge of important parts of the world that the 
results will touch (think Boeing 737 Max Autopilot, the Facebook and Google crashes, 
etc.)). 


It should be clear that “something actually ‘expert’” is needed, but that the LLM ML 
approaches today are really dangerous without serious grounding in a cognitive 
knowledge-based system. 


ADDENDA 


<to appear> 


a 


What does Alan Kay think about programming and teaching programming 
with copilots and LLMs of today? 


Alan Kay 
Have designed a few programming languages - Nov 12 


A little context and bias before | attempt to give an answer: | started programming as 
a job ca 1961 in the USAF. The programming style was “data munged by imperatives 
with loops” (sound familiar?). An interesting wrinkle was that the machines we used at 
that time (and most computers) did not have what would be called an “operating 
system” — but there were yearnings, and also interesting subsets of code that helped 
“operators” run “jobs”. In just a few years, as the result of many different forces, a 
number of different operating systems started to appear, and most of them had as 
one of their main goals to be able to run multiple processes, including processes 
written by different people for different reasons. 


Big shifts in outlook started to happen, and many of them were ultimately about 
“trust” and “security” in “evermore complex systems of processes”. The history of this 
is really interesting — including that some of the best solutions were done very early 
in the game (some of these never got generally recognized by the run of the mill 
programming establishments). In order to get to the answer of the question, I'll only 
urge readers to look into this “epistemological revolution that was generally missed by 
the field”. 


Let me use *trust* as the central theme for this question and answer. *Trust* was 
certainly part of the criteria for a program before the days of multiprogramming, but it 
was basically whether a running single program could be trusted to do what it was 
supposed to do. Bugs were there, but only affected the program that had them — in 
rare cases, bugs could be propagated via “wrong data” written on tapes, etc., but in 
practice, almost all bugs were local, so the integrity of a program was quite a bit easier 
to ascertain and improve. 


When the desire for multiprogramming happened, it was attempted on hardware that 
could not confine the different processes (sometimes an entire core memory image 
(usually less than 150KB) was rolled in and out for each job to and from a drum or 
even tape memory). This led to some hardware assists of various kinds, and attempts 
to have the cake and also eat it. 


The bottom line today is that most computations are not as secure and trustworthy as 
they need to be (in part because some of the best solutions wound up being 
perceived as being “too expensive” and then abandoned). 


Note that *Trust* becomes front and center as Moore's Law advances, especially with 
advanced networking. 


Note that *Trust* becomes IMO the *dominant issue* when all that has gone before is 
added to "NCANIP"s (non cognitive artificial non-intelligent processes* are allowed to 
run wild in forms that as Harari has pointed out “hack the language communications 
systems of our species”. 


Note that “Trust” is one of the deepest issues — much larger than “just computing” — 
when not just communication, but actual education is one of the main needs and 
goals. We want to be able to know the degree of trust we can allow for what our own 
minds come up with, what we hear from others, what we read, our teachers, etc. A big 
deal with science is that it was partially invented by humans — after hundreds of 
thousands of years — by learning better methods than trusting one’s senses, or mere 
beliefs of one’s cultures. 


A simple principle is that for most things that are automated and scaled, the *trust 
requirements* have to be vastly expanded and made vastly more strict. 


Finally, to try to answer the question ... (Summary: | don’t think it would be a good 
idea at all) 


First, | don't think the way programming is generally taught (as I've seen it ina 
number of major universities) is a good approach (and hasn't been for about 60 years 
now). Qualitative changes of many kinds have obsoleted the general approaches, the 
languages, the operating systems, the infrastructures, *and* the ability to *trust* what 
can be found online — Etc etc. 


Second, the genii has been out of the bottle for much of the Moore's Law expansions, 
and this creates critical questions — as with most infrastructure issues — of how much 
effort should be expended to keep patching the problems vs actually addressing them 
with better more secure, more trusted, and *more system* approaches. (Putting 
typical businesses in charge of these tradeoffs has been an unmitigated disaster — 
but we can see that savvy and wisdom in government is not up to the task either — 
still, something like government will be required.) 


(Note, it used to be part of the job of universities to not just teach/help-learn ideas, 
but to be both “keepers of the flame” and the vanguard for “reinventing better fire”. 
(Claim/guess: most so-called “computer science” departments in universities today do 
neither — they seem to now be trade-schools for what existing 
businesses/organizations have gotten themselves into trouble with.)) 


Third, | like to praise the (actual) Internet design because (a) it was really done well, 
and (b) | was not involved enough in it to claim any credit (thus, my praise should be 
more “trustworthy"!). A key part of their design was to *not allow direct sending of 
commands* — only bits could be sent. This means that (other) software inside each 
physical computer has the responsibility to *interpret* the bits, and the power to *do* 
(or not do) *some action* (note that this and other ideas of the Internet suggest ways to 
organize software processes, etc.). 


Fourth, | do think that an expert system could be made to *help* teach programming 
much better than a book can, and better than most teachers of programming do now, 
so that most learners can *actually learn* much better than they do now. 


By “learn” | mean that important processes and relationships are created between the 
learner's ears by their own brain. 


By “help” | mean that — especially when *changes in epistemological points of view 
from one’s own common sense are required* , it can make a huge difference to be near 
a "special human” whose personality is strong enough to make us rethink what we 
think we know. Our “trust” in this person is not necessarily that they are “right” all the 
time, but that “they are consistently interesting, and thus worth listening to”. 
Occasionally, we can have that kind of trust in an author, but it is often all too easy to 
be lazy about rethinking ... 


Fifth, | think ML and LLMs, etc., do have a part in “actual machine intelligence” in the 
future, but not nearly the most important parts. | don’t think the systems exhibited 
today are to be trusted at all (and note that using such a system to provide 
explanations for what another such system has concluded, is just piling BS on BS). That 
humans also do this all the time is “interesting”, “dangerous” etc., but it is also why 
trying to move from *superstition* (this is actually what “reasoning by correlation” 
amounts to) to more scientific methods is critical for anything like civilization to be 
created. 


Finally, | think the big deal here is vastly larger than the vast disasters in computing 
systems at all levels. If we look at human anthropology, we see a species that treats 
solitary confinement and banishment from society as punishments — we are a society 
because we can cooperate and trust a little — but when are safely back in society, we 
start competing like mad (and cheating like mad) as though the society is there to be 
strip-mined. Scaling and technique has made this a “Spaceship Earth” issue that can 
no longer be put off and ignored and violated. 


Robert Heinlein once warned: "The bull wears itself out on the cape and fails to see 
the sword”. 


€& Alan Kay 


Have designed a few programming languages : Sep 12 


What is machine code? Is it really the lowest level of programming? If it is, 
how does this make sense as we still have assembly languages and high- 
level languages like C++ and Java? 

This is a pretty good question. 


This answer is a bit more detailed than the others (but read the others, they have 
useful information). 


In “the olden days” of the early 60s when | was a journeyman programmer, we would 
often call the lowest level in which the machine could be coded “absolute code" (or 
“code absolute, or “octal absolute” or “binary absolute’). 


This is because Maurice Wilkes had invented “microcode” as a very good way to deal 
with the conflict between (a) what is called the “control problem” of a programmable 
computer, and (b) the desire for the machine to have a number of useful fast 
instructions, whose “control” would be “un-fun” in just logic gates. Some computers 
would have microcode and some not — all needed some form of control logic to 
route information (usually binary bits) around the machine from functional unit to 
functional unit. 


An early fun thing to learn about computers is that every part can be made from a 
single kind of building block — e.g. either a NAND or NOR gate. All other logical 
functions can be synthesized from these. Sometimes for convenience, three logical 
functions — AND, OR, and NOT (which can be made from NANDs or NORs) are used 
to help explanations. 


Because there is a lag in the operation of a gate — meaning *time® is also involved — 
it is easy to make a “clock”, which can then be used to help with the scheduling of 
events. For example, if we put a signal through a NOT gate the result will be inverted, 
and if we feed the result back to the input, we will get a train of digital pulses that are 
the “ticks” of the “clock”. 


The simplest things made from primitive gates are units that will add, compare, 
remember, "decode", clocks, etc. These are analogous to a railway switching yard with 
trains that can move on tracks. The “control” part is usually made of humans in 
coordination, routing the trains by switching switches, etc. The “control” of a CPU is 
logic that will route the “trains of bits" around the switchyard for various purposes. 


In along ago microcoded machine, a microcode “word” is a sequence of 1s and Os 
that will open and close gates that will act as the routing switches. Usually the 
microcode words are looked at in sequence, and there can be branches, etc. The 
earliest versions would use something like diodes to represent the 1s and Os, and the 
microcode would be wired in. This would be programmed by the manufacturer and 
would not be changeable in the field. 


A “diode matrix” ROM for holding microcode (you can see these were simple, 
fast, but painful to fix bugs/modify, etc.). 


What a regular programmer thought was “code absolute” would be the next level 
above this, and this would be fetched from main memory, held in an “instruction 
register" and carried out by the underlying microcode (plus some shortcuts of various 
kinds). (For example, the operation field in the instruction word would often lead to a 
particular microcode word and the microprogram would start executing there. 


What ever the level of coding, most programmers would not write their code in terms 
of raw bits, but would use a “symbolic assembler” to help. These would most often 
have a line of symbolic code for every configuration of bits. (| wrote a Quora answer 
that showed some of Margaret Hamilton's Apollo assembly code — the listing also 
shows the actual bits.) 


However, it was realized early on that it would be *just wonderful* if you could 
dynamically load microcode! This would allow you to not just customize your machine, 
but to create new machines as needed. The catch was that the memory for the 
reloadable microcode had to be as fast as possible — faster than main memory (at 
least 5 or 6 times faster). This would allow the “inner machine" to cycle faster than the 
main memory could cycle (and thus it would look like HW rather than SW). 


A fun machine was the Packard-Bell 440, which had an interesting compromise: the 
lower thousand words of main memory were done in terms of much faster thin-film 


technology (vs cores for the rest of RAM). On this machine you would load in your 
most used subroutines into the fast memory, and they would act like microcode 
instructions, etc. 


Some machines had “diode boards” which allowed microcode to be switched out by 
hand. 


An interesting “beast” was the Control Data 6600 whose “regular code” was essentially 
microcode, and the poor programmer had to work at a very tough primitive level to 
match up the fast CPU with the slower (but banked) RAM scheme. 


The Xerox Parc machines all had reloadable microcode, and this was used heavily, and 
worked very well. 


The reason for all the different kinds of languages (above the “bits") is to help our 
limited human minds grapple with the problems to be solved with the least amount of 
interference from complex or tricky features the machine might have. (The state of this 
art has hardly advanced in the last 40 or more years — and this is not a good thing at 
all, and results in much too complicated SW.) 


Added Sept 12 2023 — a story that might help. One of first classes in grad school in 
‘66 was “Computer Architecture” by Dave Evans, who had actually done two Bendix 
computers while a VP there (the G-15, and G-20). 


On the first day he gave us the specs of an architecture and a problem that needed to 
be solved. Each of us had to simulate the architecture to make a running emulation of 
the computer, and then solve the problem by writing a program that the emulated 
architecture could run successfully to get the desired result. 


The only reasonable and reliable vehicle IMO was a FORTRAN that ran on the 
university's Univac 1108 (which was a batch computer — so you had to submit cards 
and wait for a run to see what went wrong). Some of the other students actually used 
the 1108 assembly for this but this was just making more work for themselves. 


However, the double debugging problem here — both the emulator and then the 
emulated program running under it — motivated me to write an actual assembler 
(also in FORTRAN) for the emulated machine. This was an additional pain, but wound 
up paying dividends in getting the whole system to meet Dave Evans’ requirements 
without burning out too many brain cells. 


€ Alan Kay 
© Still trying to learn how to think better - Sep 10 


How did the first computers look like and what could they do, compared to 
modern-day machines? 

The question explicitly says “first computers” — so, for anyone with a taste for history, 
this should rule out answers for computers that happened after 1955 or so, and | will 
concentrate on “first programmable computers”. 


One nice part of the answer is that — with enough memory (e.g. many many magnetic 
tape drives) — the early computers could carry out literally every computation that 
today's computers can. The results would take much (much!) longer to be formulated. 
This principle is due to Turing in the mid-30s. 


In *practical* terms, many (most) of the things we do with computers today — though 
*computable* by the old machines — would not be *feasible* (humans live only so 
long, can wait only so long for interactions, etc.). 


A fun computer system from the mid-50s was for the SAGE air defense system. It 
required an enormous concrete blockhouse as its “case”. The bottom floor was just for 
the power supply. The next floor — about the size of a football field! — contained “the 
Q7 computer” (actually a ganged together double machine with more than 50,000 
vacuum tubes and a physically enormous core memory (the cores were almost 1/2” in 
diameter). The next floor up was “operations” and the top floor typically held about 
150 graphics terminals with pointing devices. 


About 24 of these “blockhouses” were built. The last one was decommissioned in 
1982! 


Here's a picture of one of the blockhouses: 


a ta 


Just to emphasize again: it does not take much of a CPU plus a lot of simple memory 
to allow any and all computations to be expressed. 


€ Alan Kay 
{lam the Alan Kay in question. - Sep 10 


Does Alan Kay have any updated thoughts on "People who are really 
serious about software should make their own hardware"? 


In brief: I've pointed out — long ago now — that “hardware is just software 
crystallized early”. If we combine this with why we want computers that can take on 
and run software (so we don't have to go to the trouble of making special hardware 
for each functions), we can see that the slogan is actually requesting more reasonable 
tradeoffs. 


In the earliest days of designing computers, the trickiest part was called “CONTROL”, 
the logic that would read an instruction and open and close pathways to execute the 
instruction. The great Maurice Wilkes came up with the idea of microcode — a fast 
machine within a machine with really simple fast instructions — to regularize as much 
of the CONTROL as possible. Often, simple diodes were used for the (read only) 
control memory. 


It was early realized that it would be “really nice” if the microcode control memory 
were dynamically reloadable, because this would mean that the computer would be 
nicely “parametric” — one could customize it as though it were hardware. The 
tradeoffs here are many, but the main ones often had to do with how fast was the 
fastest memory the microcode could cycle, vs. how fast was the main RAM (which was 
magnetic core storage or worse in the old days). 


The other parallel line of thought which led to the quote above came from early 
attempts to make custom hardware for special functions — a main one was to make 
machines much more suitable for running very high level languages (the nicest and 
greatest one was the Burroughs B5000 by Bob Barton). The bottom line on this 
machine was that “it was the most wonderful thing ever" (but) “it was inflexible to new 
software ideas” (and so probably should be microcoded to get the best of both 
worlds). Barton's next work for Burroughs was just that. 


We (at Parc) had big goals, but wanted them to run in personal computer sized 
packages. To us this meant the small machines had to be microcoded (and the 
budgets meant, the design had to be really good). Chuck Thacker was the main genius 
who accomplished this. 


Also in the 60s, Utah was an ARPA project mostly engaged in inventing what today we 
call CGI (continuous tone real-time 3D graphics — and 2D graphics along for the ride). 
We took a mathematical approach and designed special hardware to do the extensive 
computations involved. Jim Clark, who was part of the project, started looking at a 
single chip design that could do the computations. This was the first GPU. 


Etc. This led to the mantra above. 


It's worth noting that most of computing persisted in a dumb belief: that Moore's Law 
would always catch up to special hardware (generally true) and therefore it is not 
worth it to design special hardware (generally false). We tried to explain that if special 
hardware is going to give you (say) a factor of 5 or so over regular CPUs then you will 
be gaining quite a few years before Moore's Law can equal it. Those few years are 
critical for research, and resulted in much of the hardware forms of today. 


The other thing we tried to explain back then is that if you have new ideas and try to 
use old hardware, you are likely to spend a lot of your effort just doing work arounds 
and optimizations that are just wasting your precious time. 


I think this is still the case today. 


€ Alan Kay 
7 Invented and designed a few user interfaces ... - Sep 10 


What were the main ideas that eventually led to the design of modeless 
human computer interaction and modeless interfaces? 


The other answers are pretty good, and I've already answered this in Quora 
somewhere else (see if you can find it), but just to be brief here. 


The most important idea was that “modes are useful, until you want to do something 
else”. For example, *inserting text*, or *working on project A*. It's when you want to 
change what you are doing that you forget you are in a mode and get pissed off that 
you have to remember to get out of it before you can do the thing you really want to 
do. 


So “modeless” means that you can start the next thing without explicit exits from what 
ever you are doing and the system will clean up after you. 


This idea originated at Parc, but was influenced by systems such as RAND's GRAIL pen 
based system in which quite a few interactions were “modeless” in the sense above. 


At Parc, the overlapping window idea | had in 1971 had this character as well. If the 
window was of a process, then you could touch another window, and that would 
automatically suspend the old process and start up the new process in the new 
window. This would allow you to move around at will, etc. 


Both Larry Tesler and | were working with text editing and realized at about the same 
time that you could avoid being stuck in a “replace or insert mode’ if selections 
started and ended *between* characters. This would allow selection of 0 to n 
characters, so all input actions would be “replace” and you could always select other 
places without having to get out of what you were doing. 


This was a lot like the window idea and some of the good GRAIL ideas, etc., so we 
decided to try to do everything in such a way that new actions would automatically 
suspend old actions, etc. 


This is what we called “modeless”. Larry eventually had a license plate for his car 
saying “NO MODES" 


é Alan Kay 
‘7 Lam the Alan Kay in question. - Updated Sep 12 
Why does Alan Kay call his youth misspent? 


That was a joke about how many things | did that *weren't* computing. 


Actually, pretty much everything | got fluent at as a “youth” wound up contributing to 
my later thinking and results as a computerist. For example, | got degrees in both pure 
math and molecular biology (with minors in English and Anthropology), was a pro jazz 
guitarist for a number of years, did a lot of most aspects of theatrical production, 
served in the US Air Force, read a zillion books, etc., and too many more. 


When | accidentally wound up in an ARPA research community grad school (U of 
Utah), what the community was doing was so interesting that | started focusing on it, 
and using my “misspent youth” to help (for example, the theatre and anthropology 
were great helps in user interface thinking ... math and biology helped with making a 
kind of “object-oriented programming”, etc.) 


| should point out here that “there was no plan, grand or otherwise” — | was simply 
following my nose. | did decide to get *fluent* in each interest, but | had no vocational 
or other plan. | was just “happily curious”. | had no money, so | had to scratch to 
support myself for all of this, but a “starving student” (even almost literally) on a 
university campus is surrounded by the riches of the ages to learn and use. 


One way to look at all this is that the more you get fluent in, the more ways and things 
you will have to make analogies to, and these can lead to very different important new 
thoughts. 


€ Alan Kay 
: Have designed a few programming languages - Sep 10 


When Alan Kay conceived of object-oriented programming as being about 
objects that hide their state and send messages to other objects, how large 
were these objects imagined to be, and how closely does current teaching 
and programming reflect this? 


Summary: they were always thought of as “all sizes” — this is what messaging allows 
one to think — but it took awhile to invent all the software engineering needed to 
make the nice idea practical enough for real system building. 


Though | had been a journeyman programmer for a few years, and had seen part of 
the object idea a few times (B220 file system, B5000, etc.), it was lvan Sutherland's 
Sketchpad system ca 1962 that got me thinking about modeling and seeing the first 
Simula a few days later (how to do something like OOP by instantiating Algol-60 
blocks) that got me to see the analogies to (a) cells in Biology, and (b) algebras in 
math. 


| was in a maths mood at the time, so it was easy to see that a virtual computer could 


model anything on a computer — no matter how large or small — and that they could 
be recursively combined using messaging. This provided a really simple framework of 
VMs (with the internals being VMs) all on a point to point network (like the logic of 
the later ARPA and Internets). 


The state of software engineering in 1966 was not advanced to either the very large or 
very small idea being rendered as an object. Most of the things “like objects” had a 
fair amount of overhead, and were thus rather large entities. 


But the “maths” idea was too nice to give up. Meanwhile, | encountered LISP, which 
was not just a “neat mood” but an actual example about how to do both some of the 
maths and some of the software engineering needed. 


For example, one way to look at passing a message is by thinking slightly differently 
about APPLY in the context of an FEXPR LISP. Another example is that most LISPs since 
1.5 have mapped small integers into the protected LISP “entity space” (and the B5000 
maps all numbers into the protected address space, etc.). 


The FLEX language (for the FLEX machine of the late 60s) borrowed more SW ideas 
from Wirth’s Euler than anywhere else, and also used ideas from Shorre’s Meta I! and 
the Floyd-Evans method of parsing. 


Dan Ingalls was my indispensable partner for the Smalltalk project at Parc. The first 
Smalltalks were rather Lisp like, and then — as we got more skilled at the SE, were 
able to use ideas from more sources (including the FLEX machine) and from original 
inventions. 


We could have done a bit more for arbitrarily large objects, but — since our goal was 
inventing modern personal computing, etc. — we mainly aimed at “clean, small, 
simple, and powerful” for Smalltalk. This worked out well for our project at Parc. 


However, it’s worth emphasizing that messaging and late binding allow pretty much 
all improvements within this kind of architecture to be done while the system is in use 
(and without having to be stopped in order to make any changes, fixes, or 
improvements). 


Note: Instantiation of an “idea” is a very good idea. “Inheritance” needs to be really 
controlled in order to not turn into a nightmare (this is why | left it out of the first 
Smalltalk — Simula | didn't have it, and Simula 67 did). It was pretty easy to see that 
“some things could be accomplished” with not just inheritance but multiple 
inheritance, but the potential for *mess* was huge. Mathematically, it is a good thing 
to contemplate — but as a large idea that is part of questions about abstractions. 


I've noted elsewhere that “object-oriented” to me is not a programming paradigm, 
but a definitional scheme — it's a way to making things that carry their definitions 
with them. A programming paradigm needs to include enough design principles to 
guide designers though complexity and scaling. Messaging between instances of 
ideas is part of this. Experience would lead me to look at compositions before 
inheritance (the PIE system of Goldstein and Bobrow) used Smalltalk in an interesting 
way to do this. | think I'd look at "modeling time” before looking at inheritance. Etc. 
But | do think that establishing a tight enough meaning for inheritance could add 
more clarity than murk. | haven't seen a good example of this. 


We got away with using inheritance in the later Smalltalks at Parc because we — and 
especially Dan Ingalls — were very careful with it. A later experience with Morphic in 
Squeak was not so happy. 


é Alan Kay 
- 7 | am the Alan Kay in question. - Updated Jul 16 
Why did Alan Kay move to London? 


Short answer is that my wife — the amazing Bonnie MacBird &% — decided she 
wanted to write a Sherlock Holmes novel in the authentic voice of Doyle's Watson. She 
had been a super fan since childhood, and this project had been on the back burner 
for years. 


As she started off, she had the idea that it might be fun, motivating, inspiring, etc., to 
do some of the writing at the “Sherlock Holmes Hotel” on Baker Street, London. It 
turned out to be nicely funky, an establishment on different levels that had been 
stitched together into a hotel many years before. 


And it was right in the middle of one of London's many nice "neighborhoods" — 
Marylebone — with nearby parks small and large, a “high street” with “everything”, the 
Royal Academy of Music, Wigmore Hall (one of the best sounding in the world for 
~550 people audiences), Baker Street Station (which is a main hub with many 
underground lines), in walking distance of Oxford Street, etc. Many of the buildings 
around the hotel are Victorian flat “blocks” that were built in the 1890s, Holmes’ 
fictional era. 


And there is the larger London, which includes some of the very best English language 
theatre in the world, and in great profusion. What had years before been very spotty 
food, had been converted, partly by the influx of other cultures including Indian and 


French, into more good restaurants than one could sample. 


Bonnie's writing was going well, and we kept going back. | made up a “Hauptwerk 
organ keyboard and pedals” that the hotel stored for us, so | could play and practice 
while we were there. 


| should mention something ephemeral and hard to pin down here: but we — and | 
think most Americans — can detect a palpable “spirit of kindness” from most people 
in London. Teenagers will actually give up their seats on the underground to older 
people. If someone sees you need any help they will come and offer, etc. The “feel” of 
London is a good feel. 


At one point, | noted that “If we were to live in London half-time we could break even 
on a flat rental for what the hotel was costing us”. We looked around a bit and found a 
nice little Victorian era flat across the street from the hotel, one block off Baker. This 
had a double “sitting room" and Bonnie fixed hers up to be the 221B sitting room of 
Holmes and Watson. Here's a view out the window during a very rare snow: 


Out flat building is like these. Our small flat reminded me of my student days, and we 
both have liked it a lot. 


Then came the shock of Trump's election, followed by the pandemic. We almost flew 
back to the US for the pandemic, but decided that both the UK and the US would 
botch it, and that we'd rather try to sit it out in London than in LA. This turned out to 
be a good choice. 


During all this Bonnie had a contract with Harper-Collins and has written 5 highly 
regarded and reviewed Sherlock Holmes novels while we have lived in the flat. The 
Internet and wireless are good in the flat, so we were very much in contact with stuff 
we were doing, especially as the pandemic took hold. 


Then | wound up getting a very rare and bad kind of cancer, and one of the world’s 
experts happens to be in London. Before 2013 there was no treatment, so I've been 
lucky to have the new “immunotherapy”, which a small percentage of the time will 
delay matters if you are in Stage 4. There are side effects, but the cancer itself has 
been delayed so far. At my over 80 age, this is a “cycle of life” kind of deal, so | have 
no complaints. 


So: “London” was not a plan, but just gradually happened, and we are both happy how 
it turned out. 


€ Alan Kay 
<7 - Still trying to learn how to think better - Jun 28 


Why did you major in mathematics? 

| always was interested in math, and especially after reading in a Robert Heinlein scifi 
book for youngsters that "No matter how or where you wind up you should 
understand mathematics” 


| never had a plan to *be* a mathematician (I think), and the other of my double major 
in college was Molecular Biology (with two big minors) and | worked my way through 
school and bed and board as a journeyman supercomputer programmer in the 
mid-60s. 


| also delved deeply into to many other wonderful things as an undergrad, music, art, 
theatre, etc. so when | graduated in 1966 | was too exhausted for anything. | couldn't 
stand the idea of right away going to grad school in either of my majors, and | knew | 
didn't want a real job them — if ever. 


So | moped around for a few months until a thought popped. | could program, but 


like most programmers | knew almost nothing about “computing”. The though was 
“why not spend a year relaxing getting a Masters in computing?” That should be easy, 
fun, and restorative. 


However, | loved living in Boulder, and CU did not have a Masters in computing. So | 
went to the library and delved for places that did *and* were above 4000 feet altitude 
to have that mountain air (Boulder was about 6500). 


There was only one — at the U of Utah across the Rockies on the other slope. | sent 
off a letter not expecting anything, and to my great surprise was accepted. 


| arrived "knowing nothing” and found that I'd wandered into the ARPA research 
community and one of its best venues. 


€ Alan Kay 

<< Have designed a few programming languages - May 14 

Since Smalltalk programming language has roots in biology, is it a perfect 
match for various computational biology applications(like system biology, 
agent-based modeling)? If so, what are some good examples of existing 
applications? 

Smalltalk does have some of its roots in biology — but certainly when | was thinking 
about this kind of language in the mid to late 60s, | was not using much — if anything 
— of what | understood about molecular biology (a really early book on Mol-Bio was 
Watson's "Molecular Biology of the Gene” in 1965 — which was wonderful, but at 
scalings that weren't very applicable to computing at that time). 


The parts of biology that had a large influence on me at that time had to do with the 
immense scalings of components and interactions, both within cells — in the millions 
and billions — and in multi-celled organisms — in the millions, billions, and trillions of 
cells. 


| was very interested in developmental biology, but thought how it went about it was 
beyond the scalings of computation back then. What *were* really interesting — and 
applicable — were (a) the systems integrity involving even trillions of components 
(large things could still work after decades) — and (b) that the scheme involved just 
one set of DNA in each cell, and parts were controlled and expressed to get about 250 
“kinds” of cells to develop and interact as tissues and with other types — and (c) that 
there were many levels and kinds of “noise detectors and removers” — including at 
the cellular level — that were sufficient to quasi-stabilize the whole “mess” — and (d) 
that most of the metabolism and contents of each cell were used for local stability — 
very little was used to contribute to the whole organism. 


| was also influenced by the “polymorphism” of the endocrine system — but this 
didn't get developed as much as it could have. 


There were already quite a few “module schemes” in computing — most of them large 
and with overheads. | greatly desired everything — including “small things” to be a 
ramification of one kernel type, that could ramify out to the number of types 
convenient for an organization. (| got sucked in by math because | could see (via 
McCarthy) that this could be done “from out of almost nothing” — and this is what we 
did. Later, | realized that putting a lot of effort into designing a capable “Class object” 
would have been a better path. 


I think Smalltalk is still worth understanding. However, | personally would start from 
scratch with a new language that is much more in accord with “what is actually 
needed?” |.e. it would also have modules, but they would be rather different from 
Smalltalk's of more than 50 years ago now. 


€ Alan Kay 

<< Still trying to learn how to think better - May 5 

Alan, you have written numerous essays and given many talks on the topic 
of computing, but you have never published a book on the subject. Is there 
any particular reason behind this decision? Is it related to your idea of the 
Dynabook? 

Probably the simplest answer to the question is that | like to work on things | have 
“urges” to do, and I've never had the “urge” to write a book. 


On the other hand, | do have an urge to answer questions, and Quora reveals that I've 
written 607 answers thus far (so this is answer 608). 


| think that | average about 500 words per answer, so the Quora corpus would be a 
thick book of over 300,000 words as a kind of kaleidoscope of paragraphs and topics. 


This wouldn't be much of a book. A similar process with talks would yield another kind 
of kaleidoscopic — not very useful — book: an hour's talk at about about 135-150 
words per minute is about 9000 words, so 10 talks would be a “book” (and | think I've 
given hundreds of talks over the years). And similar arithmetic could be done on 
written papers and articles. 


A big deal is summed up by my “one sentence bio": No one owes more to his research 


community than | do. 


In other words, what | think of as my “actual work" over my life has been done within 
communities of like interest and directly with major contributions from great 
colleagues. 


Two of my favorite books of “real substance” are Newton's Principia, and The 
Molecular Biology of the Cell, by Bruce Alberts, James Watson, et al. 


The first is an incredible tour de force in all dimensions and a kind of work for which 
I'm more than a few notches short. (| do love this book!) 


The second book is not just long and great, but spectacularly keeps in mind what its 
readers need to be “told as prep” — for example, there is a wonderful early section on 
the chemistry (and atomics) of water which would not be in most biology books, but is 
included in TMBOTC because most readers will likely not remember the details well 
enough from their previous chemistry classes. Having a strong visualization of what 
water does is crucial to the larger understanding of how life works, and this book 
takes the trouble to make sure you have it in mind as you read on. 


The book was first published in 1983, there are now 7 editions (to try to keep up with 
the explosion of knowledge since then), and the authors periodically meet to work on 
the needed new editions. (For those who are interested, one of the classic editions was 
the 3rd ...). How they do it is a mystery, even to them! 


| think we can see real analogies to computing here — and also see many areas where 
computing falls short of more established real disciplines of Physics, Chemistry, 
Biology, etc. 


Something interesting and deep to ponder about Biology, is that (a) on the one hand 
“Biology means variation” — evolutionary processes find many viable organizations of 
atoms and molecules to make up living entities, and (b) on the other hand, “Biology 
means speciation” — a larger look than at the variants reveals amazing stable and 
conservative patterns and building principles, many of the most important are 
hundreds of millions of years old, and some billions of years old. 


The latter means a book like TMBOTC can be written in “only” 1000+ pages and 
successfully provide a very strong view of the most important building blocks of living 
entities. The former allows many different examples to be given within the larger 
framework. 


A side comment is to note that the advantage of a physical science is that it has to be 
based on observations on existing entities. This really helps the start of choosing 
useful perspectives on the subject. 


| think such a book is possible about computing, but no one quite like the amazing 
Bruce Alberts has stepped forth and mobilized the group of authors needed to pull it 
off. | think | could be one of the authors — that would be difficult but rewarding — 
but | couldn't play Bruce's role. 


In any case, I'm now well into my 80s, and with a not good type of cancer which saps 
energy. So this is a cop out for me now, but it’s a real one. 


A key corollary to the contrast with nature and physical science is that the synthetic — 
artistic — creation and construction in computing gives rise to way too many 
perspectives compared to examining nature, and | think most of these perspectives in 
current day computing are too weak and limited. This makes most books about 
computing very disappointing to read (yet think about how much work had to be 
done to make a book at all!) 


A “really good book" about our field would need to take a deep and reasonable stand 
about computing to gain enough context to say much of actual substance. 


P.S. | now realize | shouldn't have mentioned the cancer, and just should have said, 
“not the energy | used to have". | sincerely thank the commenters who have wished 
me well — but let's get past this if we can: it's just a “cycle of life” thing that is part of 
an inevitable process for all human beings. 


€ Alan Kay 
“1 I'm the "Alan Kay" in question (try Google for the usual misinformation) - Mar 25 


Does Alan Kay see the educational goals behind the Dynabook concept 
furthered by Al systems, especially in light of developments like GPT-4? Can 
this improve computer interaction and computer aided learning? 

One of the earliest stated goals — in the late 60s — for real education for children, 
perhaps helped along with a "Dynabook’, was: to have children grow up to think 
better than most adults do today. 


Most of the time we thought of this as part of a vastly better system and concept of 
schooling, not as a replacement for what we thought was a currently broken schooling 


system. This was the case even though | had personally learned quite a bit from 
reading on my own, | still had quite a bit of help learning to read early, and there were 
adults around who could be asked questions and often would point to something 
useful to look at further. A well written book could also do a little of this if the author 
anticipated some of the questions that the writing would raise, and included further 
directions as part of the writings. 


Another part of thinking about what it might mean to be “better educated” was 
“epistemological” — one's way of looking at the world of “out there” and the world of 
“our opinions” could and should be vastly changed as “education” progressed. An 
early champion of this idea was Francis Bacon, who called for a “new science” whose 
methods could help us get around the terrible thinking done by our “bad brains” 
(what he proposed was what we just call “science” today, but applied to everything, 
not just the study of external nature, including biology, but also of our own thought 
processes, and how they could be made less vague and error prone). 


We also correlated “civilizations and civilized thinking” with “literacy” — the 
correlation is definitely there — but it wasn't until some years later that the 
breakthrough work of Scribner and Cole — “The Psychology of Literacy” — showed 
very strongly that literacy alone isn't enough to do the job — there also has to be 
present certain kinds of structured schooling in order for reading and writing to do 
their magic. So, in the early years of the Dynabook idea, we were working on just a 
part of the idea, that: there is a new kind of reading and writing brought by 
computing — which includes the old, but adds new very important properties — and 
there is something like a “new literacy” yet to be discovered, which can have a much 
more powerful and beneficial effect on civilization than the reading, writing and 
literacy of the past. 


Most of our designs and systems we built for children were not good enough: it took 
about 25 years to finally come up with a set of ideas and processes that could carry “a 
new literacy” — and by that time we had gotten a lot more sophisticated about the 
difficulties of having good new ideas actually having good new effects on children’s 
thinking and their having fun doing so. This required quite a bit of work with actual 
schools and schooling. 


| think we were able to do an “illustration of concept” — using our system “Etoys” as 
employed in the Open Charter School in LA in the “noughts” — but the next phases 
were not done (mostly because of lack of “good funding’). 


So an important part of the question here is whether we learned enough to answer 
the question in any reasonable way. 


| think one opinion can be given, and this is that putting children in environments that 
are devoid of “real and sufficient meaning” — whether in homes, schools, books, or 
advanced technologies — and especially any environment that acts like a “Skinner 
Box" is a terrible and unethical idea. 


The current round of chatbots are not close to what is needed — and note that far too 
many adults are no improvement, and are often even worse. And the assumptions 
behind most schooling today are quite terrible for helping children grow (and with or 
without technology). Much deeper views, understandings, goals, and visions are 
required. 


After trying to understand this area and find ways to get it above the “What is actually 
needed?” threshold for more than 50 years now, | still think (a) that Bacon's insights 
are valid, (b) that really good schooling with the aid of really good technologies is 
possible, and (c) that the current confusions about all of this out in the world are 
worse than they were when the Dynabook was thought up in 1968. 


P.S. Much the same kind of response could be given today about questions regarding 
“What happened to Doug Engelbart’s real insights and goals?" 


€ Alan Kay 
‘lam the Alan Kay in question. - Mar 19 


What is the specific reference for Alan Kay's discussion related to people in 
business not wanting to hear complaints and why they should welcome 
them? 


| don't recall saying this. 


€ Alan Kay 


| am the Alan Kay in question. - Feb 2 


What is the meaning of Alan Kay's quote "the best way to predict the future 
is to invent it"? How can this be applied to any field of work? 


This came out in a fit of frustration during a meeting in the early days of Parc with the 
Xerox corporate planning group. 


| wasn't trying to coin an aphorism, but this accidentally happened. 


| happen to really like aphorisms — | think of them as a “special domain of prose that 


is also poetic". They seem to be trying to tell the truth in a compact memorable way, 
but | think of them as trying to point the receiver in a direction of more 
enlightenment. 


As with poetry, too much careful analysis can kill the “glow” that is shining in a 
direction. 


Another one of mine that has made its way around the Internet is “Point of view is 
worth 80 IQ points!”. 


(Note that the sign bit is not specified, but needed — but would kill the glow of the 
larger idea.) 


The big idea is to differentiate between “reasonable and accurate thinking” in a 
random typical context, vs. “the same level of reasoning” in a really strong context. 
This came from trying to get audiences to realize (a) they were even in a context, and 
(b) that everything changed when science was invented as a new context to reason 
within, and (c) that most of our improvements from 200,000 years ago were via 
gradual cultural learning and teaching, and that in the last few thousands of years 
were via “inventions of new methods” that could piggy back on our language and 
cultural apparati. 


| once started a talk with the easiest way to predict the future is to prevent it! This 
is what most people and organizations do. Then went into the aphorism above. 


But then — this was at NIH in Wash DC — reminded the audience that the town was 
full of politicians “inventing the future” (but inventing terrible versions for us). 


This is why | think a good aphorism is working when it draws attention to thinking 
deeper — taking even good ones literally is at one’s own peril. 


However — that said — here’s perhaps my favorite one (it's from Einstein): 


As far as the laws of mathematics refer to reality, they are not certain, and as far as 
they are certain, they do not refer to reality. 


This one really gets to the heart of many important matters — far beyond both math 
and science — and works extremely well just as he stated it. 


€ Alan Kay 
“1 Have designed a few programming languages - Feb 2 


What does it mean to be an "Alan Kay" programmer? What are some ways 
to become more creative in programming? 


An “Alan Kay programmer" is a person with a few odd, potentially worthwhile, ideas, 
and deep enough knowledge and skills to work with a group of people with much 
deeper skills. 


For example, this question would never have been asked if | hadn't started working 
with people like (especially!) Dan ingalls, Ted Kaehler, etc. | could program pretty well 
— especially small mathematically dense stuff, that might have to be in microcode — 
but pretty much everyone | worked with could code rings around me — and | wanted 
them to! Sometimes I'd code an idea | couldn't get anyone interested in — knowing 
that they would most definitely rewrite it to be much better. 


Etc. 


This group entity idea was not uncommon in the ARPA/Parc community. Even Ivan 
Sutherland, whom | consider to be in a qualitatively higher level in most things, got a 
little help to do the amazing Sketchpad, and very much preferred to subsequently set 
up groups for really hard really interesting problems (and this despite that he was 
extremely self-contained). 


Bottom line: if you can achieve synergy in a group you will usually get exponential 
factors of improvement. 


€ Alan Kay 
“1am the Alan Kay in question. - Feb 2 


Do you agree with Alan Kay's comment that a computer science degree 
teaches students how to write bad code? 


| never said that — we should at least pose an accurate question. 


One of the (many) things | have pointed out is that a computer science degree is no 
guarantee of “the kind of perspectives that are needed for (good) computing” — this 
has parallels with the larger idea that a university degree is no guarantee that the 
recipient is now at a “what is actually needed" level of learning and understanding. 


I've also said that | nonetheless advocate going to university if possible, because there 
are generally more chances to encounter ideas that will trigger real thinking and real 
desires to learn. 


P.S. by accident, and thanks to Richard Strachan, | found the below from Kurt 
Vonnegut right after writing the above (to me this is priceless!): 


In 2006 a high school English teacher asked students to write a famous author and ask 
for advice. Kurt Vonnegut was the only one to respond - and his response is 
magnificent: 


“Dear Xavier High School, and Ms. Lockwood, and Messrs Perin, McFeely, Batten, 
Maurer and Congiusta: | thank you for your friendly letters. 


You sure know how to cheer up a really old geezer (84) in his sunset years. | don't 
make public appearances any more because | now resemble nothing so much as an 
iguana. 


What | had to say to you, moreover, would not take long, to wit: Practice any art, 
music, singing, dancing, acting, drawing, painting, sculpting, poetry, fiction, essays, 
reportage, no matter how well or badly, not to get money and fame, but to experience 
becoming, to find out what's inside you, to make your soul grow. 


Seriously! 
| mean starting right now, do art and do it for the rest of your lives. 


Draw a funny or nice picture of Ms. Lockwood, and give it to her. Dance home after 
school, and sing in the shower and on and on. Make a face in your mashed potatoes. 
Pretend you're Count Dracula. 


Here's an assignment for tonight, and | hope Ms. Lockwood will flunk you if you don't 
do it: Write a six line poem, about anything, but rhymed. No fair tennis without a net. 

Make it as good as you possibly can. But don't tell anybody what you're doing. Don't 

show it or recite it to anybody, not even your girlfriend or parents or whatever, or Ms. 
Lockwood. OK? 


Tear it up into teeny-weeny pieces, and discard them into widely separated trash 
receptacles. You will find that you have already been gloriously rewarded for your 
poem. You have experienced becoming, learned a lot more about what's inside you, 
and you have made your soul grow. God bless you all!" 


Kurt Vonnegut 


Alan Kay 
Still trying to learn how to think better - Updated Feb 5 


How did Alan Kay recruit and interview his researchers at VPRI? 


| just noticed that | have been misreading this question — or it got changed? In any 
case, my answer below was about my ARPA and Parc experiences. I've added a little bit 
about Viewpoints Research Institute (which | didn't set up until the early 2000s), and 
put this below the main answer 


Parc grew gradually, in part because — as Bob Taylor told Xerox, he “didn't plan to 
hire any ‘good’ people”. This shocked them and he explained that he was only going 
to hire ‘great’ people, because you can't really approximate a great person with any 
number of ‘good’ people, and that ‘good’ people needed to be managed, and he 
didn't want people who needed to be managed, etc., and this all meant that it would 


take a while to staff up Parc (and that computing research at Parc would be relatively 
cheap (because of the scarcity of ‘great’ people)). 


And then a tremendously lucky thing happened. A business downturn put the newly 
formed "Berkeley Computer Corporation” (BCC - making a new kind of time-sharing 
computer) into financial collapse, and Taylor was able to hire the entire 8 or 9 people 
— these included Butler Lampson, Chuck Thacker, Peter Deutsch, etc., (i.e. a whole 
group of actual ‘great’ people), and in a flash, in Jan 1971, Parc computing research 
had a critical mass just a few months after starting up. 


| was consulting there at the time, and soon after called up my friends at CMU to ask 
forgiveness for breaking my promise to go work there in a few months — | said, “with 
Butler and Chuck and Peter and Ed Fiala, etc, we can now do anything”. Part of the key 
here was the unique brilliance in all directions of Butler. 


Still, after this critical mass windfall, what Taylor predicted happened. The hiring 
process at Parc was slow for both Taylor's reasons, and that he wanted the existing 
staff to *all* really want (really really want) the next person to be hired. Taylor's idea 
was that it was worthwhile to put a lot of effort into this, because it would pretty much 
eliminate rivalries when the new person came on board. Of course, we complained to 
Taylor about the time taken for this, but he was completely right (as usual). 


But because of the BCC influx and the few who were already there, lots of really good 
stuff got started. There are a few interesting digressions about how the small 
headcount and the way Xerox treated headcount, that shaped Parc culture, but I'll 
avoid them here. 


As for myself, and the group | put together, I'll oversimplify (too much) and just say 
“by literally talking and literally handwaving” (the hand usually had the cardboard 
model of the Dynabook | had made a few years before after meeting Seymour Papert. 


Some of the people who were attracted were already working for others at Parc, and 
others came as the result of talks | gave around about “destinies”. | was only 
interested in those who got “stars in their eyes", and | figured | could help them fill in 
the rest. This also took quite a while. 


However, | got really lucky early on in a casual hallway conversation that Dan Ingalls 
— who was working for someone else — got “stars in his eyes” and so did Ted 
Kaehler. The impact these two had (for the rest of my life) cannot be overstated or 
overpraised. Similarly, John Shoch was a student in a class | taught at Stanford, and he 
was really interesting. | invited him to do a summer internship and he never left (and 
wound up being a major factor in networking of all kinds at Parc). 


He thought I'd be interested in meeting Adele Goldberg, who was at Stanford doing a 
project with Pat Suppes, and “to meet Adele, is to meet a truly top-class 
multidimensional force of nature”, so | was very fortunate to hire her (she also had 
“the stars", etc.). Adele brought Steve Weyer along from Stanford. 


Diana Merry was a PA for the head of the Physics Lab at Parc very early on, and | 
happened to see her do something with a time-sharing typing service, that required a 
“programming kind of thinking”. | said to her, you could be a good programmer, so | 
hired her away and taught her a few things (which she learned quickly and starting 
doing helpful things with Dan and Ted). In addition, Diana turned out to be one of the 
great “bulldogs” of all time — nothing daunted her. 


Dave Robson was a student (I think at UC Irvine). He had heard of Smalltalk and got 
quite far by himself trying to reverse engineer it (just from vague descriptions). We 
invited him to be a summer intern and basically never let him go back. 


So the summary answer is “pretty randomly”, and often via processes like temporary 
internships that allowed us to see how much “depth and fun” a person might be. 
There wasn't a big plan about it. An interesting thing about this group was what | have 
to call “love” — the people really liked each other, despite the huge differences in 
personalities and kinds of expertise they had (and this feeling persists 50 years later). 
This allowed a lot of ground to be covered. 


If we cycle back to “ARPA ways” we will find a very loose organization that was not 
really “hierarchical” nor really “democratic”. Argumentation was almost never for the 
purpose of winning, but to expose more points of view (this is a hard one to explain to 
people). Many many times after long “discussions” the group would then defer to the 
person they thought had the best view of things for the final decision (so “voting” was 
rare, but things worked). 


Bob Taylor was once asked just what he did at Parc (he had been a research 
psychologist, not a technical computer person). He said "| wanted to get great (and 
self driven) people, and they tend to be lone wolves and not really manageable (and | 
didn't want to try to manage them). | wanted to set up an environment that — when 
these people needed to cooperate to make much better progress — they would do 
this”. 


This is what happened — it was quite wonderful — people would really be thinking 
their own thoughts, but would wind up cooperating when that was a good idea. A 
good example, is that pretty much everyone at Parc could design and make a 


programming language/OS, etc. (and did). So the agreements there were minimal. 
However, we also had decided to try to make "100" of anything we invented — and 
this meant 100 Alto personal computers, the Ethernet had to allow at least 100 taps, 
etc. This required more cohesion — and this happened — and still without any top 
down pressure or planning. (It's also worth examining the ARPA process over some 
years that created the standards and methods for the Internet.) 


In my view (and Butler's view, etc.) Taylor was a kind of social genius with regard to 
how to deal with talent, and get it to stay original, yet also get it to synergize. 


VPRI was started “AD” (after Disney), and originally was a vehicle for staying with some 
of my favorite colleagues (this was right after the “bust” of the late 90s, and right 
before 9/11. The greatest early aid was my friend Bran Ferren (a very special person in 
so many important ways) who had pretty much been CTO of Disney for a few years. 
He manage to talk Disney out of a few hanger buildings on Disney's 100 acre 
“Imagineering” lot for a new company he and Danny Hillis put together called 
“Applied Minds”. They liked having us “hang around" so Bran offered us free space if | 
was willing to site my NPO VPRI there. (This was a great thing and a lot of fun!) 


But we had no funding, so we scrabbled around in various ways. “Research means you 
can change your mind” so | have always avoided working on problems/goals posed by 
funders, and stuck to this regardless. Kim Rose was completely essential to every part 
of our process — the highest priority part was that we decided to “never miss payroll” 
— and we didn't (sometimes via dipping into my savings, etc.) 


The other principle — that I'd gotten from Dave Evans in grad school — was that you 
should never lie to a funder. His underlying principle was that you can’t do what he 
called “good science” if you get into the habit of lying. So — he said — you have to 
start “learning how to tell the truth in very special ways”. 


VPRI was on a very small scale for much of the time, but top talent was still involved, 
so we did what we could do — some of this was to do some major investigations of 
Etoys in schools, etc. 


Some of the additions were really random circumstances. For example, | had spent 
quite a bit of time off and on in Japan, and even was asked by the Japanese 
government to help them spend some of their funding they wanted to use to widen 
their range of talent pool. This turned up a number of "real deals” who had very little 
connection with the normal Japanese channels. One was an artist of considerable 
talent, who also had considerable talent with computing, and he was unencumbered 
with any kind of fear of being weird. He did some really wonderful things with us for a 
few years, and wound up going back to Japan, got married, and returned to some of 
the art he had been doing (we miss him very much — he was a great person in all 
respects). 


| will another story — later and elsewhere — about my long time relationship with 
Yoshiki Ohshima — a real friend and a great colleague. His advisor in Japan brought 
him to us while we were at Disney with the warning: “Be careful, this young student 
has strong opinions and expresses them often”. My reply was “Great, I'll hire him right 
now!”. Yoshiki is one of the best people I've worked with over my career ... 


| would occasional teach an “Alan Kay mystery hour" class at UCLA, and did a joint one 
about education, computers, etc with a joint class in the US and one in Japan, using 
special equipment. We had a variety of interesting kinds of students, but there was 
one who was *really interesting* — *really really interesting* — who was just sitting in. 
He had a few insights about the computer tool we were using that were startling, and 
l invited him to be an intern at VPRI, wound up being his thesis advisor, and colleague, 
etc. This was/is Alex Warth. 


| should mention Bret Victor. | had heard of him and seen some of his stuff — we got 
together in a park in Palo Alto. About 15 seconds after talking to him, my thought was 
“Wow, this guy should have his own research group — VPRI is too small a venue for 
him" — so | hired him as a consultant, until we wound up finding enough funding for 
YC and Vishal Sikka to set up HARC — and gave him the larger place to play that he 
needed. 


| met Vi Hart — via a lunch we set up — because | wanted to ask her questions about 
her process of making completely brilliant video explanations of things, especially in 
math. She was being funded by Salman Khan — which to me was a wonderful thing 
because | thought Kahn's videos were almost “anti-real-math". 


It turned out that they had hired her, but weren't supporting her. My response was 
“Oh, I'll hire you right now! Come to HARC and follow your instincts!". And she did. 


HARC did not last long, but we were also able to get Dan Ingalls to move there from 
Sun, etc. 


And so on and on... 


A basic idea here is that — although there is nothing really wrong or sinful to improve 
on things that are around (sometimes if only for safety, this is what you really want to 
do) — it is also the case that we think our largest thoughts by inventing new contexts 
that are often very different from those around previously. 


This is difficult to do (because you are trying to do this with what you learned in 
contexts that need to be escaped. By societal definition it is “delusional” and even 
often considered “crazy”. 


And it is very difficult to tell a "new great-contexter” from an actual “crazy person” (or 
even just a blow hard). Dave Evans liked “other side of lines” types, and he would give 
quite a few a chance, who eventually didn’t work out. He didn’t care when they didn’t 
(and ARPA only gave Dave funding, they didn’t control his process), so Dave happily 
kept on giving people chances — and he wound up with enough "geese who laid 
golden eggs — rather than lead ones or worse" to staff up a first class ARPA project. 
(This was very similar to ARPA’s philosophy on the big difference between “error” and 
“overhead” that I've written about elsewhere on Quora.) 


é Alan Kay 
<< Have designed a few programming languages - Jan 31 


Can I become competent programmer if | will start learning Computer 
Science without attending university lessons, only relying on textbooks, 
internet resources? 


| feel that this is an honestly asked question — and it is a difficult one, in part because 
it isn’t complete (it lacks many side conditions, such as how you are going to get 
experience writing programs, and especially how you are going to get feedback to 
help you improve both your programming and design abilities). 


It could have easily been asked the same way with the substitution of “musician” (or 
“instrumentalist”) for “programmer”. 


| certainly don't think you need to go to university to get fluent at either programming 
or music — but | also think that both really require good advice and feedback from 
fluent enough entities (mostly human, though it's possible to imagine good enough 
computer helpers for quite a bit of both). 


Since most of both will wind up between your ears, the largest part of the process is 
lots of hard work with good examples over a fair amount of time. With good feedback, 
you can accomplish a lot. 


| don't know of good computer “advisors” for either (but | have seen low level 
computer aids for both, that do provide some elementary feedback). 


An important caution is that any kind of curriculum — even bad curriculum — is 
difficult to make, and really good curriculum in most subjects is hard to find, whether 
from humans, textbooks, or the Internet. 


Here's an example in music. “They” say: 


“There are four beats to a measure, and a quarter note gets one beat, a half note 
gets two beats, and a whole note gets four beats”. 


*Musically* this is quite untrue — no musician will play four quarter notes evenly to 
perfectly divide the measure into four equal time durations. Musicians use another 
parallel idea called “pulse” (in jazz, called “groove”) that is a human approximation to 
this, but doesn't feel metronomic. Singers are rarely metronomic, and singing is a 
great way to start to become a fluent musician. 


(This is much trickier to talk about today than 50 years ago because of the many anti- 
musical elements that have been forced on the public (like drum-machines, auto-pitch 
correction, etc.) 


This area is much like speaking with literal emphasis on strong and weak syllables. 
Consider from Shelley:* 


If Winter comes can Spring be far be hind? 


Here we have 5 pairs of weaker and stronger emphases. This is a kind of poetic meter, 
but we wouldn't say it or sing it metronomically. It's worth thinking about how we 
would say or sing it to convey the most meaning, while still making use of the nice 
lilting rhythm as part of the “groove” rather than the control of the “groove”. 


The best way to learn this (and music) is to actually listen to others and to do the acts 
yourself. And try to get someone to help you tune into the meanings and musicalities 
that are only vaguely implied in the written down forms. (Note how little information 
is in the script of a play — there is usually less in most written down music.) 


In learning to program, you are also using written down forms — and things will 
happen because of what you write — but the larger object is not a page of code but a 
running *system*, and the imagination required here can be greatly aided by advice 
and a friendly helper. Learning to think in systems terms from the beginning will make 
an enormous difference in how far and quickly you progress in the art of design. 


A central idea here is that much learning is really autodidactic, but even strong 
autodidacts can greatly benefit by asking questions and getting feedback. 


¢ Gratefully borrowed from Stephen Fry 


€ Alan Kay 
<< Still trying to learn how to think better - Jan 27 


| used chatGPT to summarise some of your opinions. What does the real 
living human think of artificial intelligence condensation of these 
previously expressed opinions? 


Here are two chatGPT summaries, as furnished by Mark Dickson: 


1. Criticism on computer science: Write a criticism of the lack of progress 
in computer science and software engineering as Alan Curtis Kay 
would write it. 


Alan Curtis Kay, a computer scientist and pioneer in the field of object-oriented 
programming, has been vocal about his critiques of the lack of progress in 
computer science and software engineering. 


One of his main criticisms is that the field has become overly focused on short-term 
gains and solving superficial problems, rather than addressing deeper, more 
fundamental issues. Kay argues that the field has become too focused on the "low- 
level details" of technology, such as the speed and efficiency of algorithms, rather 
than considering the larger, more important questions of how technology can be 
used to improve people's lives. He has also criticized the industry's tendency to 
prioritize features over usability and design, resulting in complex and difficult-to- 
use software. In addition, Kay has criticized the lack of emphasis on interdisciplinary 
collaboration in computer science and software engineering, arguing that the field 
would benefit greatly from more collaboration with other disciplines such as 
psychology, sociology, and anthropology. He has also pointed out that the field has 
not done enough to promote diversity and inclusion, which he believes is essential 
for creating truly innovative solutions. 


Overall, Kay's critiques highlight the need for the computer science and software 
engineering field to focus on long-term progress and solving real-world problems, 
rather than being solely driven by the latest trends and technologies. He advocates 
for a more holistic approach that takes into account the larger societal implications 
of technology, and emphasizes the importance of interdisciplinary collaboration and 
diversity in fostering innovation. 


2. Write a critique of object oriented programming as Alan Kay would write it. 


Alan Curtis Kay, who is considered one of the pioneers of object-oriented 
programming, has also been critical of certain aspects of the paradigm. 


He has argued that the way object-oriented programming is typically taught and 
practiced does not fully capture the original ideas and intentions behind the 
concept. Kay has criticized the way that object-oriented programming is often 
reduced to a set of rules and conventions, such as inheritance and polymorphism, 
rather than being understood as a way of thinking about problems and solutions. 


He has also criticized the way that object-oriented programming is often used to 
create overly complex and inflexible systems, rather than being used to create 
simple, elegant, and adaptable systems. 


Kay has also pointed out that the focus on objects and classes in object-oriented 
programming can obscure the underlying principles of computation, and that the 
emphasis on encapsulation can lead to an over-abstraction of the system, making it 
harder to understand, maintain and extend. 


He also has pointed out that the use of inheritance can lead to a proliferation of 
classes and subclasses, making the system hard to understand and maintain, and 
that the use of polymorphism can lead to a proliferation of methods and interfaces, 
making the system hard to understand and extend. 


Kay has also said that object-oriented programming should be seen as a way of 
organizing and structuring code, rather than as a one-size-fits-all solution to 
programming problems. He also has said that object-oriented programming should 
be used in conjunction with other programming paradigms, such as functional 
programming, to create a more flexible and adaptable system. 


Overall, Kay's critiques highlight the need to understand the original ideas and 
intentions behind object-oriented programming and to use it in a way that is 
consistent with those ideas and intentions. He also points out that object-oriented 
programming should be used in conjunction with other paradigms, and that the 
focus should be on creating simple, elegant, and adaptable systems. 


ChatGPT Jan 9 Version 7. 


The first one is more interesting than the second (which includes outright errors (such 
as what | think about encapsulation), and many other things). Both of the examples (I 
think) are much more vague than my view of my own writing style, and they have 
other deficiencies. But, if taken as summaries by another entity (human or otherwise), 
the vagueness is reasonable to expect. 


A key omission (how did the “Al” miss it?) is the lack of central concern | have about 
*systems* and especially about “what winds up running", and the “lack of ‘systems 
consciousness’ in the field today”. 


Another is that I've repeatedly stated that | don't think OOP is really a “programming 
paradigm”, but is rather a universal definition method (a tortured comparison would 
be to something universal like NAND, and a particular approach to making a 
computer). 


The danger here is that “form very often rules over content” in most of human life, and 
the current “Al”s have been aimed at mimicking form rather than at understanding 
content. Given the general state of the public outlook, this is a disaster already well 
along. 


€ Alan Kay 

7 Have designed a few programming languages - Jan 13 

What was the source of documentation and programming knowledge 
before the internet? How did daily coding look like without having access 
to Google? 


The major HW manufacturers had "SHARE" organizations of their customers to share 
lore and sometimes code. 


In the research game, there were papers from the several main conferences and the 
ACM, etc., that imparted techniques (these were invaluable). 


For down and dirty, one would read code (and most of the code in the 50s and 60s 
was still machine code in one assembler or another. Sometimes these were 
commented (and often the comments were not updated when bugs got fixed, etc.). 


Typical code looked like this: 


(a CERNE CVENEES FE OF 88e FemErOe Ker er oT eeee mernerermeT eet ane eee ree - es] 
‘ ee oor Peet oe 


tprmeet, © ey a 
ent fees whee ie 


wheres 


1h eer reeme e ore enepenee 


1 watt ate 


12 EE Ch t PH 


~~ oie « a 
ie Sots Ft ee oe ee, 


methane Se ee 


ee weer meee — eee atte een 


eaoee eo erent meen te feet reer t tree tern ttt) Stitt ee 
chet «ETS LATS tee teee news 
PELE Stet em eee comme ccemectccemmns stone sees mncseemnes seteeess st SEEGERS SESE ETE EEE SESE ETOWIS TERRE EET TmE ET OTT m 


7. fe oo - ee 


Le en in 
ee " os 


(This is some of Margaret Hamilton's code for the Apollo moon mission Guidance 
Computer.) 


This had some extreme side conditions, including a tiny slow “odd" computer, and 
that most of the code had to be put into a very difficult to make ROM (of “core rope” 
memory). But listings like this were typically what one wanted to look at, especially the 
semi-readable assembly code columns in the middle of each page. 


€ Alan Kay 
{I'm the "Alan Kay" in question (try Google for the usual misinformation) - Jan 7 


Are the slides (Squeak image) available for the Alan Kay talk linked in 
comment? 

They aren't right now, but — without promising anything — something like them 
might be made available in 2023 or 2024. Here's the talk in question: (starts at 2:15 if 
the link doesn't do it). 


Smalltalk had unlimited “projects”, which can be thought of as “multiple desktops 
persistent over time”, where, in each, all the resources of Smalltalk were available 
*live*. This is a workflow idea that would be very nice to have in modern systems. 


This (also) means that each project can be used to show an idea — and that (anything) 
can be interacted with during a talk. Going from one project to another is a bit like 
going from one slide to another in a presentation system, the difference being that 
you don't have to “can” anything and are not limited to e.g. “build” features. 


| have used versions of this for many of my talks over the years (presentation systems 
are terribly confining, and “anti-computer’). 


This one was quite a bit of fun because it uses one of the first Smalltalks (1978, and 
the one Steve Jobs saw) to have “projects”. | was able to do everything without having 
to add any new primitives to that Smalltalk (the talk was aided because the old 
Smalltalk was now running on a machine with lots of memory — so it was possible to 
include many more bitmap pictures than we had room for back then). 


However, these freedoms also encourage “Easter eggs” etc., and means that many 
things — often that seem magical — can be done in the order the speaker chooses on 
the fly. 


Today's presentation systems, on the other hand, have a useful feature that things can 
be advanced by a single control — such as <space bar> or clicking. This is especially 
useful when someone other than the author wants to run through the presentation (as 
in the question here at hand). 


This would require a few more features in the live Smalltalk. For example, to also “can” 
interactions with the system, and this would generally require not just capturing (easy 
enough), but to also allow editing of these (a larger major chore). 


For this historical system, this would amount to adding important features that we 
didn't have or use back then (things would start to get confusing as to just what 
things were like back then). We would not have to change the kernel, but these 
changes would be out of the scope of back then. 


On the other hand, this year 2022 was the 50th anniversary of Smalltalk, and next year 
is the 50th anniversary of the Alto. Besides museum exhibits and celebrations, the 
interactivity of the Alto invites an online site so that more people can experience what 
things were like when the Alto started working in 1973. 


The (very complete) precursor of Microsoft Word (called “Bravo") is very usable today 
— and it would require quite a bit of scaffolding to make enough of it easily learnable 
for visitors to an Alto website. The old Smalltalks — because they are live 
programming and development systems — would require lots more scaffolding. And 
a use of an old Smalltalk — as in the Ted Nelson tribute | did above — would require 
even more to allow visitors to smoothly run through it. 


However, it would be great to have all the additional scaffolding done for a historical 
live interactive website — so we are looking at just what would be good (and possible) 
to do. 


We are all indebted to Dan Ingalls for his many versions of the Smalltalks, and now, 
even more, because he has made a “Smalltalk Zoo” of live interactive versions of many 
of the old systems. | think the latest version of this is available through the Computer 
History Museum's web site — Phillip Remaker found this link to the Zoo: 


welcome 


Smalltalk-72 Redux Smalltlak-72 is a beautiful language kernel. Ever since... 


& https://smalltalkzoo.thechm.org/ 


é Alan Kay 
‘1 I'm the “Alan Kay" in question (try Google for the usual misinformation) - 11mo 


What is the history of the first personal computer? 

| think we first have to come up with an agreed on definition for “personal computer”. 
What most people think of is something which cost originally in the range of a 
television set — but | think we could extend that to include other “personally owned 
things” such as automobiles and houses (but not Lamborginis or office buildings, etc.). 


Using this expanded range, we could look at machines like the G-15 and the LINC, etc. 


Mary Allen Wilkes’ LINC computer in her home (invented by Wes Clark) ca ‘62. 


Please write comments to help the definition, and I'll try to answer. 


€ Alan Kay 
“1 Have designed a few programming languages - 11mo 


What does Alan Kay think of Ruby programming language, since it was 
inspired by Smalltalk? 


As | said to a similar question about Swift: | don’t think my opinion matters. 


However, | think the Ruby people did a number of nice things with their design. | think 
that more kinds of *safe reflection* and *workable MOP®%, etc., would enhance the 
good that is already there. 


Mainly though, | think that all the programming languages I’m aware of today — 
including Smalltalk and Ruby, etc. — have failed to be redesigned and reinvented to 
reflect how Moore's Law has affected computing resources. 


| think that the support for "Moore's Law” software pretty much stopped in the early 
80s, and the new generation of computerists — many now in corporations — started 
to optimize and patch the older ever more obsolete forms. Once you've put on these 
hats, you can’t even find your “design hat”. 


— Added Dec 3 2022 ———— 


This is in response to the 5 current comments, which are all about parallelism. This is 
about optimizing, and what is needed is something qualitatively different. 


Namely, that to actually do a real Moore's Law for SW it is necessary to exponentially 
change/raise the level of *meaning* and *intent* that can be given by the 
designer/programmers. In other words to start building programming systems that 
know about many things beyond how to give variables values and create simple blind 
relationships. Such a system would automatically avoid most of the recent large 
company systems crashes. This is analogous to how an embedded good garbage 
collector simply does not allow any kind of memory crashes. 


John McCarthy wrote a paper in 1958 “Programs with commonsense” and spelled out 
a lot. He in vented Lisp in order to work on this kind of programming. 


Sketchpad was a different approach to the same understanding of the need for a 


qualitative jump. Basically, in Sketchpad, the programming was done in terms of 
requirements for what you wanted, and Sketchpad had three problem solvers that 
could find solutions to the multiple requirements given to it. Its lower level was the 
first OO framework, but the programming was cosmically aimed at the future where 
the main programming is all done in terms of *meaning* in ways that are quite 
separated from any kind of optimizations. 


€ Alan Kay 
‘- 1 Had something to do with "Object-Oriented Programming” : 11mo 


How an object is created? What are the important parts necessary to create 
an object? 

This is an interesting question — I'll take it to include "... on a vanilla computer with no 
“object language” already present”. 


My view back in the 60s: since any computer with enough memory can simulate any 
other (including future computers), then a simulated (“virtual”) computer can also 
define every part of any computation — and this includes not only processes, but 
what was then called “data”. The simulated computers would be running code of 
some kind, so multiple processes would have to be protected from smashing each 
other, etc. So this would be like the dividing of a time-shared computer into separate 
protected processes that can communicate with each other. The communications 
would be much safer if they were “requests” rather than “commands”. And so forth. 


It should go without saying that any useful computation process should also be able 
to be instantiated in various ways. 


Basically, a software version of computers on a network or bus (and very much 
influenced by the ARPAnet that was then being worked on). A good larger vision of 
this is that “everything should be a server". 


This is really easy to do, except for: safety, and efficiency of time and space. So — in 
practice — a lot of software engineering is required to make the schemes practical 
and useful. Once done, there is an opportunity to make highly scalable systems from 
the objects — this is a design task that is quite separate from making objects (terrible 
systems have been done using object components, etc. — and a few nice ones as 
well). 


You can figure out what to do next to allow “messages"/"requests” to work — note 
that the form of a procedure call will work — it includes a “target” and parameters for 
the request, etc., but that the semantics has to be expanded in a number of ways. 


For example, the target has to be able to ignore the request (it may be from a 
dangerous sender), the target may need to acknowledge the request, but delay the 
actual answer or service, the target may need to gather knowledge over time to help it 
serve, and so forth, the target may needed to remember and undo its services, ... A 
really simple one is that standard subroutines are set to establish a “history chain” of 
calls, that then needs to be unwound in reverse order — this has to be modified at 
several levels — to allow other processes to run in various semi-concurrent ways — to 
not require a history chain (maybe something more comprehensive. etc.) The 
instantiation of procedures has to go beyond re-entrancy and recursive use, etc. 


The basic idea here is to think long and hard about *systems* and what is needed to 
make them, and then to make a basic “node” that can do any and all that is needed. 


A more sophisticated scheme could be to send requests to a “broker” process that can 
find the best servers (take a look at what LINDA does, etc.). This could be “call by 
*need*” — in other words, you don't know the name of the service, but you can 
describe what you need, and the brokering processes will do the finding for you ... 


Bottom lines here are that (a) “something vaguely like an object” can easily be made 
from any vanilla computer resources, (b) most such things in most current “OOP” 
languages are neither very good renderings, nor is their use enlightened, (c) in the 
current day of many doublings via Moore's Law since 1966 (about 35 doublings 
perhaps), a version of OOP is needed to deal with current scales and needs (d) having 
great objects, does not say all that is needed about how to organize them — this is 
one of the greatest ongoing issues (I.e. "OOP" is not really a programming paradigm, 
but a universal process definition scheme). 


€A Alan Kay 
~~ Lam the Alan Kay in question. - ly 


What does Alan Kay think of Neil Postman’s invited talk at Apple in 1993, 
and are his main arguments even more pertinent & relevant now 30 years 
later? 


I think the referred to talk was: 


(there is some noise on the left audio channel — balance to the right to help.) 


Neil and | were good friends. | set up this talk and invited him to have a go at a room 
full of rather set opinions. 


He once told me a story that gives a lot of insight into him and his approach. When 
Neil was in grad school in the late 50s, he and his fellow student Charlie Weingartner 
got interested in Marshall McLuhan and started following him around to his talks. Neil 
said they soon noticed that McLuhan did not respond to questions or arguments from 
the audience. Instead, he would say “how about this one?”, and come up with another 
of his “from right field" weird cryptic aphorisms. Neil and Charlie eventually realized 
that McLuhan didn’t care whether anyone agreed with him — what he cared about 
was whether the audience was actually thinking at all! He viewed his purpose as a 
wake-up call. 


It's also important to understand that Neil Postman not only could really write, but 
that he was extremely conscious of his writing process, control of rhetoric, and the 
vast differences between oral and literate societies, world views, and expressiveness. 
The result was both some of the most readable serious prose during his time, but also 
his oral talks were actually written beforehand, and so well that he could read them 
and still be “oral” for his audience, but “literate” as to depth. 


In my opinion, this made him the best, most accessible, and most important public 
gadfly of his time. He could write “Education as a subversive activity” at the beginning 
of his career, and a bookend at the end: “Education as a conserving activity”, and 
make both views be even more valuable together and supporting than their initial 
seeming opposition. 


He could watch television, and condemn it without being a hypocrite. He could 
inveigh against “technology” in deep ways, while still using it, or becoming a Luddite. 
Neither “conservative” nor “liberal” terms fit Neil well. 


He was essentially an Artist reporting what he saw and thought about during his life. | 
found him to be one of the most endearing, genial, and civilized people I'd ever met. 


He knew — as did McLuhan — that, down deep, public utterances are at best 
“commercials” and “carrots” for the much harder and longer pondering that most of 
us need to —have to — do to start grappling with the real issues. | miss him very 
much. 


So, with the above as context, what do viewers think Neil is trying to get the audience 


to think about? 


Alan Kay 
Still trying to learn how to think better - ly 


Do you think that Alan Kay was more focused on children than education? 
Why? 

H.G. Wells said it better than | can: “Civilization is in a race between education and 
catastrophe”. His notion that education could be a deterrent to catastrophe was part 
of the age old “nature vs. nurture” questions that gained ground as Anthropology and 
Psychology delved deeper into trying to understand humans, their cultures, and 
behaviors. 


Children who are deprived of a human cultural surround when growing up — such as 
“Genie”, who was raised in a closet with no contact, or the various “wolf children”, who 
were raised with animals — do not develop normally. Much of what we consider to 
“be human”, such as language, etc., are not embedded by genetics in a child's brain at 
birth — instead, genetics embeds “desires, interests, propensities, etc.” that draw the 
child's attention to specific examples in the environment to be learned as behaviors. 
For our species, this process is so strong that it forms the “prime idea of 
anthropology” — a child at birth can be taken anywhere on earth to another culture, 
and will grow up as a member of that culture, thinking the thoughts of that culture, 
being that culture, etc. 


A key idea here is that we humans treat what is more or less constant around us as 
“normal” — and, critically, we shortcut “normal” in our minds to “reality”. Because our 
nervous systems are tuned to differences, we pretty much disappear “normal” as an 
idea that is relative to surrounds, especially cultural surrounds, and simply take most 
of our circumstances as “real” rather than “mental constructions”. What we believe 
mostly determines how we see and interpret the world. 


The deep “nature vs. nurture” question is: to what extent can our impulses from 
genetics be constrained and augmented by learning — and especially: early learning? 
For example, could the US Navy SEALS slogan actually be true? l.e. “Under pressure, 
you don't rise to the occasion, you sink to the level of your training’. 


There are many examples where training/education does augment — even replace — 
some of the behaviors arising from our genetics. But could this be enough to make 
important differences in larger human societies? 


| don't know the ultimate answers here, but have long been interested in the wide 
spreads of behaviors that are associated with "humans in the raw” and “humans in 
society”, and especially the extremes, which often cut both ways. An often examined 
example is the transformation in the 20s and 30s — in less than a generation — of 
Germany from one kind of stability to a very different dangerous society — and then, 
after WWII, back to a very different kind of system from either. 


Readers who like to think in terms of systems will reflect that examples like these don't 
necessarily mean that every individual in e.g. Germany, was swung from one extreme 
to the other and back again — but that the societal system — the overriding culture 
— manifestly did (and rather quickly). 


Another systems perspective of this has to do with thresholds for indiviual elements of 
systems that are needed to have the larger organization be stable enough to “work”. 


A famous reply by Jefferson in response to whether democracy could ever be made to 
“work" was: “I know of no safe depository of the ultimate powers of the society 
but the people themselves; and if we think them not enlightened enough to 
exercise their control with a wholesome discretion, the remedy is not to take it 
from them but to inform their discretion by education”. 


| think it's fair to say that this central idea — that education in a democracy, above all 
has to raise citizens above the threshold needed to participate — has been absent in US 
education for so long that the education establishment doesn't realize it ever existed 
as an idea and goal. 


This doesn't mean teaching a party line, but it does mean teaching e.g. how to argue 
with others in such a way to illuminate difficulties to aid solutions (as opposed to 
“trying to win an argument’). 


This idea about arguing to make progress was deeply embedded in the ARPA research 
community, and was a vital part of its success: it allowed the widest view of ideas and 
issues to be brought forth and cooperation — and damped out fruitless contention 
and competition of egos. (The ARPA process apparently started in the radar project at 
MIT during WWII.) 


An interesting question beyond whatever the needed educational/training processes 
might be, is “when in life can they best be taught and retained?”. 


Most indications point to childhood, especially the first 7-10 years — this is when 
many mental commitments to “world view" (aka reality”) are most strongly made. 


My own interests in all this went from “vague understandings” to “vivid thoughts” 
after meeting Seymour Papert in 1968, and seeing what he was trying to do with 
children and thinking, via giving them a rich environment to play with ideas/processes 
through the interactive LOGO computer system. 


A vital point about what Seymour was doing is that — even though he was an 
excellent mathematician and came up with many mathematical examples for LOGO — 
his main aim was “world view" (a fancier larger term for this is “epistemological 
surround"). 


He wanted to find ways to qualitatively enrich what children grew up taking as 
“reality”. When he said that “the best way to learn French is to grow up in France”, he 
didn't just mean the language, but also the culture (metaphorically meaning: modern 
ways to view and think about our situation). 


This strongly echoed Maria Montessori, whose main underlying aim was the same — 
she went about it by pointing out that “you can't learn the 20th century in a 
classroom”, but must somehow be able to grow up in it. She decided to embody this 
in the fabric of her school to make it *be* “the 20th century”. 


There are literally many more critical ideas that are part of this: too many for a Quora 
sized answer. 


A big one — in size and amount of understanding required — is how we are affected 
by “media” — intermediary things, tools, processes, ideas — that are part of our 


environmental surround. McLuhan quipped “We become what we behold!” via 
intermediaries — in other words, there is nothing neutral about tools/communication 
systems —what counts is what changes happen in our brains to learn anything, 
especially tools and media — but he said almost nothing about our difficulties in 
really beholding. 


For example, what we most learn from a tool — such as a hammer — is that we “can 
hammer" and “how to hammer" — the pragmatics of “hammer” — but we have great 
difficulties in coming up with larger ideas about “hammering” (i.e. don't hammer 
others when you get upset, etc.). Doug Engelbart realized this in his first papers about 
“augmenting humans”: that providing humans with ever more powerful tools is a 
disaster — between the human and the tool has to be education/training, not just in 
use, but in “use for what?”. 


This means that when a new powerful tool/medium comes along, the burden is not 
just how to teach the use of it, but the “use for what?” and the “what is it really?” of it. 
In other words, we need to learn how to insert barriers that prevent simple-minded 
pragmatism between us and our ever more powerful intermediaries. 


Vi Hart: “We must ensure that human wisdom exceeds human power”. 


Much of modern science is about how and why to go past “the world is as it seems” to 
understanding that the world is “mostly not” as it seems, — much of what's important 
is “behind the curtain” — and thus everything needs to proceed much more slowly 
and deliberately in order to get more valid views to “make the invisible more visible”. 


Thus the main “world view" of science strongly obtains in most other walks of life as 
well, especially in matters of society, its “discretion”, and how to raise it. The 
perspectives of the new branch that is systems science, can be particularly helpful. 


We should be able to see that simply “teaching and using tools" is not just futile, but 
very dangerous with powerful tools (this includes not just recent tools — such as 
networked personal computing — but also older tools — such as reading and writing). 


So, in the race between education and catastrophe, we must learn to deeply understand 
catastrophe, education, intermediaries, society, humanity, and children, and then take 
major actions for all. 


€ Alan Kay 
-- [Had something to do with "Object-Oriented Programming" - ly 


What did Kay mean by “I wanted to get rid of data. The B5000 almost did 
this via its almost unbelievable HW architecture.”? 


One way to think about classical “data” is that it is something passive that can be 
munged by an assignment operation. Procedural languages generally have variables 
and data structures that can be munged — so-called “OOP” languages generally 
permit “setters”, which also allow munging. 


The Burroughs B5000 computer with many very different features was revealed in 
1961. It was designed to efficiently run Algol 58, and much of Algol 60, more or less 
directly. Much has been made about its hardware stack, but this was just the tip of the 
iceberg. The most interesting features of the machine had to do with safety of both 
state and processes, and especially how memory was structured. 


Aword (48 bits) in memory was marked as either a number or a “descriptor” — a 
protected pointer to hardware or software entities. In "regular mode” these pointers 
could not be forged. One use was to implement virtual memory, via a “presence” bit in 
a descriptor. If on, the core address of the segment was in the rest of the description, 
if off, the rest of the descriptor had the disk location of the segment, and 
automatically brought it back in. 


An important idea is that the "byte-codes” of the B5000 did not contain storage 
references and could not directly see memory. So the OS (also written in an extended 
Algol) would *grant* an environment for code to run that the code could not hurt. All 
this was the basis for “capabilities”, and made the B5000 almost uncrashable in 
practice. 


| should also mention that the machine had at least two CPUs and an automatic 
process switching mechanism. 


A very interesting feature was how the B5000 handled code such as 
A<-B 

The byte codes looked like 

<name call: A> <value call: B> <operation: <- > 


The first syllable wants an address for A on the stack. Next a value of some kind is put 
on the stack (B stands in for any expression). Finally the <- is done to the top two 
items in the stack. 


The interesting wrinkle is that if A contains a procedure descriptor, that procedure will 


be called with a flag indicating that a *name* should be the result. I.e. that the A is on 
the left side of the assignment statement. (This later led to the idea of “L-values” and 
“R-values" in languages like BCPL). 


This meant that the B5000 could hold off munging and protect structures at will. This 
was pretty neat! 


| thought about this a lot as a grad student. One example, that couldn't be done 
“right” with this scheme was to model sparse arrays because the name call was early 
and separated. This eventually got me to abandon the idea of “<-" as an operation, 
and to think of it as a kind of parameter: 


A <- Bshould be something like A("<-", B) ... which (wonderfully could actually be 
done in Algol). 


In other words, the idea is to have encapsulated entities which can only receive 
messages as requests (not commands), and they get to control what state and state 
change means. | thought of this as abandoning imperative programming for “request 
programming”. So the entities would be much more like “servers” than passive 
vulnerable bits. 


€ Alan Kay 
‘1 Still trying to learn how to think better - ly 


Why is school & college the way it is, limiting most children's potential? 


This is why good questions are very often better than answers—especially when it is 
about systemic problems. 


Sometimes, when in this position, just working on “how things should be” will get 
further than trying to figure exactly how things got to be so bad. 


So, let's turn the question around to ask: "How can we best help children to maximize 
their potentials, while using the best conceptions of education we can come up with?” 


Again, this doesn't have complete crisp answers, but it does give us some starting 
points. 


For example, a really good starting place would be to have society really understand 
what Maria Montessori wanted to do and did to help children — in part because she 
took on our second version of the question as her life’s work, and — as an authentic 
genius — came up with many great insights and starting places. 


| think reconciling an enlightened understanding of the best world for a child with 
best conceptions of what adulthood should be like is a very good way to start working 
on the future. 


It is hard to see how a start can be made without a combination of idealism and 
realism that is the heart of modern thinking. 


é Alan Kay 
<< Still trying to learn how to think better - ty 


What is the major obstacle in teaching computing in general and coding in 
particular in primary and secondary schools? 


This is a good question that can lead to even better questions. 


First, | think the problems with most of education — including about computing — 
are systemic, with many parts failed or missing. In addition, the current failed system is 
large and active about protecting what's left. 


Second, “coding” today is both a trick word that has been co-opted for a too tiny 
subset, and its larger meaning from the past misses what the larger meaning of 
“programming” used to signify. 


Curriculum needs to not just be better, but qualitatively different. Like writing, it is a 
lot of work to even produce a flop, let alone something really good. 


Of the many problems of curriculum, one is the strong tendency today is not include 
any important matters where it is felt that “the teachers are not up to it”. (I've been in 
many meetings where good ideas never made it for this reason.) 


Most teachers are working like mad, and are in very difficult situations regarding 
students, cultures, management, politics, etc. before even grappling with 21st century 
high knowledge. And one of many causes of these problems is that today's teachers 
are generally themselves the product of the failed system. 


However, the state of the teachers are a critical part of any improvements that are 
attempted to be made. 


Human beings are set up by nature to learn from their cultures: however, writing, 
math, science had to be invented, etc. In many senses, school exists to teach the hard 
to learn stuff that doesn’t arise naturally in cultures via genetics and traditions. One of 
Montessori’s great ideas was to make the whole school the embodiment of the way 


the outside culture should be by embedding it in the processes of the school (rather 
than trying to teach the ideals directly in a classroom). | think this is a very good idea! 


However, no teachers were more trained or deeply vetted than Montessori’s. Right 
now our society does not actually value education, and is not up to supporting it by 
vastly improving teachers in every way — including paying the good ones today what 
they are really worth. 


If we take “computing” seriously — and | do — then in its relative youth, it needs 
analogies to similar difficult fields to help see how it should best be taught. Two that 
can be helpful are “writing, reading, literature", and real mathematics fo the general 
public. 


The first of these is easiest to discuss. School “likes output from students”, and it has 
shown that it doesn't care much about the quality or relevance of the output. 
Processes have tended to shrink to the easily testable output that the school systems 
feel comfortable with. 


Frank Smith points out that *ideas* are at the roots of writing and reading, and these 
are not only the core of “literature”, but are larger in many ways than current literature 
itself. For all these with regard to children, we need to look at children’s books of ideas 
that are not only readable, but upliftable. If we look at ideas in the 19th, 20th, and 21st 
centuries, we need to recognize that many of the most interesting and central involve 
science, systems, mathematics, technology, and improving how to think. (It is hard to 
find children’s books with much real content about any of these — or to find 
teachers.) On the other hand, one of the most important traits of science, systems, 
math, etc., is that they are *arts*, and thus defunding the arts to teach e.g. science is 
robbing Peter to pay Paul, missing the point really badly. Without the ability to sense 
and appreciate and make beauty, much of modern striving is a dead end. 


It's worth thinking about writing for a moment. It is a big deal to learn how to write 
sentences, but an enormous error to confuse this with “writing”. What's most difficult 
in writing is to create a larger system of connected ideas while retaining something 
graceful in the small. 


| think we can apply this directly to thoughts about computing and its learning. 
Computing is, over all, really about systems, not computers. And learning about 
systems, and thinking in systems terms, is right up there as a new standard subject for 
all children on their way to adulthood. | think we can put “ideas in systems terms” into 
our version of Frank Smith's core ideas. Now we have to think about — and in the 
world of how children think about things — what should “reading” and “writing” be? 


Good answers will require not only a revolution in public schooling, but a revolution 
from what is now a kind of pop culture around computers, that much of the time can't 
see the systems for the bits. 


Lots of real deep work ahead. 


€ Alan Kay 
1 Still trying to learn how to think better - ly 


What was the staff size at Parc? 


Parc had 4 labs: computer science, systems, physics, optical. The ARPA culture that 
formed most of the computer part of Parc pervaded most of the first two, and none of 
the others. My group and | were in the Systems Lab and both Systems and Computer 
Science labs worked closely with each other (the reason for two labs instead of one 
was mostly a funding ploy by Bob Taylor). 


A lot of what Parc is known for was done in the first few years by both labs. Butler 
Lampson — a principal figure at Parc — has estimated that the number of researchers 
was about 25. The balance of researchers to support staff at Parc was about 1:1. 


An important historical fact was that Xerox did not allocate headcount on the basis of 
budget, but used “slots” (which were like gold). Whenever a slot became available we 
would try to find and hire a star. This led to a very unusual (and effective) makeup of 
lab denizens! 


This is one point in time. A few years later | could count about 40 or so researchers. 


The other two labs were similar in size and support. A simple calculation shows that 
Xerox paid for all of Parc many times over with just the profits from the laser printer 
alone ... 


A possibly interesting side note is that the salaries were not high at Parc, but that the 
average yearly hardware support for each researcher was very high: much more than 

their salaries. This is because the aim and reality of Parc was to spend enough money 
to “live in the future” by anticipating what would be possible with Moore's Law in 15 

years, and to duplicate those conditions for the researchers by inventing and building 
that future. This was relatively expensive per person, but the low head count made it 

affordable. 


€ Alan Kay 

<7 Lam the Alan Kay in question. « ly 

Could Xerox PARC have been done remotely with today's technology (not 
necessarily methods)? What does Alan Kay think about "remote work" for 
collaborative research? 


This is an interesting — and tricky — question. For one thing, attempts to create “a 
Xerox Parc” in any form have not been successful. | think the previous ARPA 
community was critical to helping shape most of the researchers who formed Parc, 
and that many of the critical researchers at Parc were not replaceable. 


For another, among many aims, Parc was part of a tradition of research to create 
“remote work for collaborative research” predating Parc (e.g. Engelbart) — and Parc 
had several active projects around this idea, as did Negroponte’s Arch-Mac group at 
MIT. 


The “today’s technology (not necessarily methods)” part is also critical because serious 
technological resources are required to even have a chance. For example, while | was 
at HP and Disney | got to see and use HP’s “Halo” collaboration technology done 
originally for Dreamworks to do live media movie editing between the UK and LA, and 
then was able to late see Cisco's experimental collaboration tool (which had been 
used between Hong Kong and LA to build itself). And later, to see Ishii’s “Liveboard" 


inventions, Applied Minds’ schemes, etc. 


Another tricky part is to notice that what we were working on at Parc would have been 
easy or not needed at all if the technology to create it remotely had existed. 


The best answer | can come up with here is that some of what made Parc work could 
have been done remotely — but | don't think all. | think much of the actual fabrication 
of SW could have been done remotely — but the *main work* of the SW (the problem 
finding and design, etc) would have been difficult to impossible remotely. 


Part of the problem of remote collaboration is to solve the problems of both “getting 
things done together" and “arguing without alienation” — especially in groups — 
mismatches. | think the HP and Cisco technologies did a very good job of transmitting 
enough “presence” to get a lot done, and also to allow for a fair depth of arguing. 


There have also been enough experiments in eye contact and how to do it, to go 
further. There are deeper problems of trust and trust building that are really hard to 
do with remote technologies, and also problems of “intimidation” and social coercion 
that need to be looked at. (For example, take a look at “focus” at a live concert or 
theatre vs. canned versions of the same thing, and think about intermediaries to these 
experiences.) 


Both “goofing around with each other” and the nature of the “goofing” were critical to 
Parc, and quite a few of the most important visceral processes are not readily 
subsumed by any kind of technology. 


Alan Kay 
Still trying to learn how to think better - ly 


What are some notable reimplementations of SketchPad by Sutherland? 


| don’t know of any reimplimentations — but | wish there were. 


Alan Borning's “ThingLab" thesis at Stanford/Parc improved on a few of the Sketchpad 
ideas, but did not cover the whole territory. He is also responsible for numerous 
constraint solver systems — including Cassowary, which is the heart of many of the 
mechanisms for doing automatic layout, especially of web pages, etc. 


Sketchpad was one of the deep roots of CAD — | don’t know of any CAD systems with 
the “deep feel” of Sketchpad, but there is lots of “surface influence”. 


The bridge demo in Sketchpad has been redone a few times — | have used a version 
by Yoshiki Ohshima and Vanessa Freudenberg in a number of talks now online (one 
was the “Turing Centenary” talk). 


Brian Silverman has wanted to emulate the TX-2 to run the original Sketchpad 
assembly code, but for a variety of reasons, including the ever changing TX-2, this has 
not happened. 


Ivan's thesis is likely clear enough so that a reengineered version of it could probably 
accomplish a lot. This would be a good target for an MS degree | think. 


€ Alan Kay 

<1 Have designed a few programming languages « ly 

How did Alan Kay decide to take inspiration from a biological system - 
(operations of a biological cell) to an (human) engineering system namely 
object oriented programming...was it just random inspiration or was he 
intentional about it? 


The actual history of the particular set of inspirations is given in “The Early History Of 


Smalltalk" written for the ACM “History of Programming Languages II in the early 
90s. 


Many of these kinds of decisions depend on how loosely or strictly one sees analogies 
(I'm on the loose side, but short of being a mystic). 


One of my major concentrations in undergrad was Molecular Biology (the other was 
pure Math). In 1965 Jim Watson published “The Molecular Biology of the Gene", which 
among other things had one of the earliest assays of a living organism. And of course, 
| learned the biochemistry of the level below life. 


| was a working programmer at the National Center for Atmospheric Research, but 
was not “thinking systems” at this point — | was basically a “data app” programmer. 


The next year | went to grad school, and accidentally wound up an ARPA computer 
research department. ARPA was all about systems and the first one | saw shocked me 
into waking up and looking around. This was Ivan Sutherland's Sketchpad. A week 
later | had to learn the up to then unknown language Simula. Then | learned about the 
plans for the ARPAnet. Etc, 


As | recount in the history | was finally presented with enough examples to get me 
thinking about how to handle complexity in somewhat the way Biology has done it 
since the Cambian Era: by eventually being able to combine billions, even trillions, of 
components using universal building blocks (for example, our human bodies start with 
one cell and one set of DNA, and wind up with about 10 trillion cells of about 250 cell 
types (still with the same DNA). Each cell uses quite a bit of “overhead” to maintain 
itself, and contributes a little to the whole organism. 


When the practical world of computing is factored into this (especially in the late 60s), 
what one gets is a vision that is a bit more comprehensive than either Sketchpad or 
Simula, but more or less along similar lines. This requires omitting many interesting 
other things that Biology has to offer, but you still get a lot. 


Finally, | should mention that | was not trying to invent a programming language, but 
rather an extensible kernel — a definitional scheme — for specifying universal “cells” 
which could then be organized into a desired system. For example, Smalltalk is 
actually the *idea* of definition, it doesn't tell you how to program. The “image” — an 
example system that is shipped with Smalltalk is but one of many ways to use the 
definitional idea. 


(In practice, we should have explained this distinction better.) 


€ Alan Kay 

<~ Still trying to learn how to think better - ly 

What can you say that everything you see in nature is governed by the 
patterns based on mathematical concepts and theories? What is it about 
mathematics that might have changed your thoughts about it? 


IMO, this question seems backwards — it is far more likely that nature is just the way 
it is, and we are the “pattern assigners” to the phenomena we can detect. | say "IMO" 
because the overall question is more philosophical than scientific (which IMO is a 
higher level of opinion than this). 


We call the patterns we use to reason with *mathematics*. We like stories that seem to 
have some coherence, and the deductive/inductive framework helps makes better 
stories. Explanations of phenomena are never absent from the thousands of cultures 
that have been studied. 


The big shift from “math thinking” to “science thinking” started to happen when 
phenomena and their explanations were looked at much more closely, and more 
evidence of special kinds in phenomena started to be demanded. 


For example, there is no evidence that “nature is governed by the patterns based on 
mathematical concepts and theories”. 


Two quotes frame up this point of view. The first, from Kenneth Clark: “People in the 
Middle Ages were as passionate about Truth as we are, but their sense of evidence was 
very different” — and from Albert Einstein: ‘As far as the laws of mathematics refer to 
reality, they are not certain, and as far as they are certain, they do not refer to reality’. 


People who like pondering ideas like this should get and read the well known physicist 


Eugene Wigner's essay “The unreasonable effectiveness of mathematics”. 


Alan Kay 
Still trying to learn how to think better - ly 


How did Xerox PARC invent the mouse? 


Xerox Parc didn't invent the mouse. 


The original mouse was invented by Doug Engelbart and Bill English in 1964 as part of 
investigations into how to use pointing and gestures as input to interactive computing 
system. 


This uses an old idea — that dragging two disks at right angles will produce 
differential spins of each disk depending on the direction of drag. In this case, each 
disk is attached to a potentiometer that will change the voltage though it at the same 
rate. This can be interpreted as distance traveled in x and y. 


This worked just well enough, and mice were relatively easy to make. Engelbart and 
English soon moved to a smaller 3 button mouse which was used for the NLS system 
of the "Mother Of All Demos" in 1968. 


For the Alto at Parc in the early 70s, we initially wanted tablets, but good ones were 
still quite expensive to make. We looked at the NLS mouse for a variety of reasons 
(another story) and wanted one that was of higher resolution, and that was small 
enough to allow drawing with it. The first Parc mouse used digital shaft encoders 
attached to small ball bearings, and IIRC had a resolution of over 200 to the inch 
(2502). 


| really loved that mouse. However, it would get dirty, so other attempts were made, 
including a very successful “ball mouse”, and an equally successful first optical mouse 
(Dick Lyon). 


Later, at Atari, we did a “flea” which was a very small optical mouse (it worked really 


well). 


Alan Kay 
Still trying to learn how to think better - ly 


What became of the work done at VPRI? 


The website for papers still exists: Viewpoints Research Institute @ You can also look 
at “Croquet.io" for ongoing work. Also take a look at “worrydream.com” for Bret 
Victor's work. 


€ Alan Kay 
- 7 Still trying to learn how to think better - 1y 


What was the oldest computer or device you remember using? How does 
this compare to the machines you use today? 


The first programs | wrote that ran on a computer were done ca 1962 in the Air Force 
for an IBM 1401 with 8k 6 bit characters. 


However, while in grad school a little later | got to write a few programs for the 


Lincoln Labs TX-2, some of whose HW dated to ca 1956. 


€ Alan Kay 
1 Still trying to learn how to think better - Updated 1y 


If Alan Kay was doing a PhD thesis today, would he write all software from 
scratch? Is that more than what is actually needed to get the point across? 
Originally Answered: If Alan Kay was doing a PhD thesis today, would he write all software from 
scratch? Is that more than What Is Actually Needed to get the point across? 

Ivan Sutherland once pointed out that “A thesis is something 3 people will sign”, and 
Dave Evans saw getting a PhD as a “necessary evil”, a “union card” that you needed to 
get but shouldn't spend much time on. These were two of the three signers of my 
thesis. 


Both of these computing greats were talent oriented, and wanted to get as many 
talents of as many kinds into the research community as possible. Here “What Is 
Actually Needed" was “a significant advance in computing without using up much 
future research time”. 


They also encouraged consulting “because that is the real world” etc. 


One way they did this was to encourage and allow students — if so minded — to 
choose and work on their own projects, to see how they functioned as “real 
researchers who didn’t yet have a PhD”. These projects could be big ones or smaller 
ones. They wanted you to finish the latter, but didn’t require it on the big ones. 


Dave Evans got me a job consulting on a small desktop computer project at an 
aerospace company almost immediately after starting grad school, and several years 
later when | asked Dave what | should do for a thesis, he said “Just write up what you 
are doing”. That turned out to be a very untidy mishmash of ideas, but many helped 
later on at Parc. “Three people”, including Dave and Ivan, signed this, and | didn't look 
back. 


| did write code as part of this, including microcode, but the best code was written by 
better coders later. The “thesis” was “It is now time and very possible to invent an 
interactive graphical personal computer”, and the mishmash was a lot of the things 
that needed to be done, and how they could be approached, and sometimes with real 
examples. 


| think it was signed because — as rough as the mishmash was — | had made my 
point. 


On the other hand earlier, lvan had sole access to a supercomputer 3 hours a day and 
programmed and designed a tour de force — Sketchpad — in a year. He still used 
some tools done by others. He really made his point (I think much better than | made 
mine) — but both points were made. 


A side comment: If we look at the question, and take “what is actually needed" 


(WIAN) as a threshold that implies a qualitative difference between below and above, 
then it seems clear to me that we only have to do “all software from scratch” if that is 
part of the point that needs to be made (and in a few cases it might be, or there might 
be no other route. 


Another comment: It's worth noting that the "before computer” standards and 
norms don't really work well for many computer PhDs. One facet of the field can be 
looked at as "math", and so at least part of this can be done solo. Another facet can be 
looked at as “design” and can sometimes be done solo (but good design often 
requires testing and iteration, which often requires building. And then we have that 
much of computing has to do with real world (and real time) issues, which do require 
not just building, but often a fair amount of engineering (in my experience, the best 
stuff has a combination of invention and engineering and iteration). 


But grad schools generally don’t cater to the actual processes of computing. This puts 
the "single contributor” idea of a PhD in question. 


Another problem is that — today especially — most grad students do their computing 
on vanilla laptops etc, which means they are computing in the past, and also 
competing with highly optimized SW done by teams. In my era, we used or built “time 
machines” — supercomputers etc — that allowed a single user to “compute and think 
in the future” (that would come via Moore's Law). This was one of the keys for the best 
stuff in the 60s and 70s. 


So: ways to do a real PhD as part of a large real invention team would help. Being able 
to use supercomputer cycles to avoid having to optimize would also greatly help. 
Getting universities to revamp their theories of the field and what a PhD should mean 
would really help. Etc. 


Also: see the comment by Jennifer Ramirez below, and my reply. 


€ Alan Kay 
<1 lam the Alan Kay in question. - Updated 1y 


How does Alan Kay think Ivan Illich performed as a problem-finder? 


| think anyone interested in education past, present, and especially future should read 
Illich's books — particularly “Deschooling Society" — for an early deep critique of 
standard schooling — and “Convivial Tools” for crucial perspectives on our tool-based 
species. 


| would rate him quite high as a “problem finder’. 
Illich's own words about the problems of schooling: 


The pupil is thereby "schooled" to confuse teaching with learning, grade 
advancement with education, a diploma with competence, and fluency with the 
ability to say something new. His imagination is “schooled" to accept service in 
place of value. Medical treatment is mistaken for health care, social work for the 
improvement of community life, police protection for safety, military poise for 
national security, the rat race for productive work. Health, learning, dignity, 
independence, and creative endeavor are defined as little more than the 
performance of the institutions which claim to serve these ends, and their 
improvement is made to depend on allocating more resources to the management 
of hospitals, schools, and other agencies in question. 


Alan Kay 
| read a lot - ly 


What does Alan Kay think of James Kunstler’s “The Geography of 
Nowhere”? 


Summary: worth reading overall, could be more aimed at *why* people/society are the 
way they are rather than just the nutty, even crazy things people have done and are 
doing without the notion of consequences. 


This book is on Kindle (for $0) and | encourage taking a shot at it. 


Alan Kay 
~ Still trying to learn how to think better - ly 


Why did so many of the innovations at Xerox PARC not make it into 
mainstream computing until several decades later? What lessons can future 
innovators learn from this? 


It's worth looking at the history of the Ethernet. It was of interest to the outside world 
because there wasn't a workable solution to “local area nets” in the 70s, and the 
Ethernet worked nicely (if a bit oddly to conventional thinking). Still, it had a pretty 
straightforward path involving other companies (e.g. DEC) and making a new 
company (e.g. ThreeCom), and doing considerable engineering on connectors, 
speeding it up a little, etc. 


The GUI was even simpler: to see it was to get the basic idea — a bitmap display and 
pointing device more or less required, along with enough memory and processing 


power to do the rendering. This required an executive with influence (Steve Jobs) to 
see the power of the idea (this happened 6 years after the invention of the Alto), but 
the Lisa had a good version of the GUI about 3-4 years after Apple decided to go for 
it 


It's interesting to note that these two inventions did not have an alternative viable 
form, so there were niches they could fill. Note that both got easier and easier to do 
as Moore's Law expanded. 


As an extreme example in the other direction, we could look at some of the 
programming practices are Parc, quite a few of which have not been generally 
adopted even to this day. There is a lot of evidence that the main reason for the 
difference was that everyone in computing already thought they knew how to 
program, but it was clear they didn’t know how to do a LAN or GUI. 


My bottom line is that | think there is a general aversion to really learning new things 
in the outside world, so: NIH! partly from misplaced pride, partly from faddism, and 
partly from laziness. 


The ARPA/Parc community was generally better at being willing to learn and use most 
already existing good ideas, and especially to share new results. | think this might have 
been partly due to the long history of this community, which stretched back to the 
WWII years. 


€ Alan Kay 
Still trying to learn how to think better - ly 


Why was Charles Babbage unsuccessful in creating a working mechanical 
computer, when the Jacquard loom had already been invented? 


It's interesting to compare the recreation of “the difference engine” with the Jacquard 
loom. The approach to engineering and practicality could hardly be more different. 
The difference engine was much simpler than the analytical engine, but could barely 
be cranked because of the amount of backlash in the scheme (I've cranked both 
recreations ...). 


> 
| 
| 
| 
’ 
Fr 
’ 
. 
; 
: 


Of special note is to look at many chains of base 10 gears in precise alignment of the 
engine vs. the “rattley ad hocness” of the wires and guides of the loom (below). 
Jacquard's approach was to “let things flop around” — basically to be noisy when it 
didn’t count — and to get preciseness only when needed by getting the flopping 
parts — usually wires — to go through guides when precision was needed. 


The bottom part is a pretty conventional loom. The left side is the card mechanism. 
The top is the card reader that — nice and simply — uses the holes in the cards to 
shift hooks for the many wires that reach down to pull up or not the warp threads. 
Take a look at 


Jacquard machine - Wikipedia 


Control device attached to weaving looms This portrait of Jacquard was... 


& https://en.wikipedia.org/wiki/Jacquard_machine 


and 


to get a sense of the truly remarkable set of inventions and mechanical approaches 
that allowed these to work — early on with just one operator power — at the rate of 
about 1 or 2 cycles per second. 


Babbage could see that the card programmable loom could also be programmed for 
other things, but was unable — unwilling, probably — to work out what should be 
done downstream. It would probably not involve gears, would not be base 10 for most 
of the mechanism, and would still be very difficult to make work. For example, | think 
both the engine and the loom approach would need some form of restoring logic in 
order to work — this can be done mechanically — and not easily at all — and, for 
example, is much easier to do pneumatically. 


The analytical engine was thought about ca. 1837, just about the time the “Barker 
Lever" @ was invented in England as a pneumatic force amplifier mainly used to 
“power assist” the key action in large pipe organs. This is done by using a weak pull to 
open a valve for much more powerful air that can then do the work (usually opening a 
much stiffer valve). 


The greatest interest and early use was in France with the first large instrument 
completed in 1843. The “trackers” on these organs are rather like the “wires” of the 
Jacquard loom, and it is possible to see how a device that uses mechanical wires can 
be given intermediate power boosts via the Barker mechanism. 


A pneumatic computer was certainly possible using the technology of the mid 1800s, 
but there is no record of such a device. 


I think that Babbage lost out because of his drive for precision rather than logic, and 
though he was a terrific mathematician, he was not in the same class as the great 
engineers of his time. 


It's also worth pondering the likely negative influence of the difference engine over 
the analytical engine. A perspective to consider is that the difference engine “almost 
works", in that it can successfully calculate polynomials by cascaded addition, using a 
scheme that is on the order of an old desktop calculator. 


The problem is that the scheme doesn't scale mechanically, yet it is how Babbage was 
thinking at the time. With his new thoughts about programmability, he also needed a 
very different physical paradigm to make a programmable engine. The Jacquard loom 
had more than the cards to offer: what was so interesting about it was how it handled 
much larger scales of information, noise, and mechanisms. The difference engine 


made it difficult for him to discard it in favor of what the new problem needed. 


Alan Kay 

1 Still trying to learn how to think better - ly 

Hi Alan, | am a huge fan of yours. I'm just a regular computer/phone user 
who is trying to get a job in the industry, but | want to help advance 
computing in the ways you described. What can | do? I'm interested in 
utilizing today's powerful hardware 

| suggest looking at the idea of “What Is Actually Needed” (WIAN), which I've written 
about in Quora, and occasionally described in talks. The center of the idea is that just 
trying to improve something often doesn't take enough context into account, and 
may wind up making the larger situations worse — especially if the larger situation is 
already a bad idea. 


One part of this process is to learn how to really criticize — especially that which has 
been around long enough so that it seems “normal” and “tested by time”. 


Another part is to find the WIAN threshold, which is often qualitative. For example, 
WIAN might be something radically different than a “better old thing”. One route is to 
get sensitized to seeing “almost a new thing”. 


Computing is full of “unfortunate old things” that computer people are happy with 
repainting (for example, the standard OSs and the WEB). It's worth trying to 
understand what went wrong to allow “containers” in their current form to be 
considered as “reasonable”. Why are most people still programming using the mostly 
weak ideas of the 60s. 


Etc. 


In order to find which knowledge you need to learn and invent, you first have to 
imagine the qualitatively better contexts that require that knowledge. 


Alan Kay 
21 Still trying to learn how to think better - ly 
What is the list of civilization's most "powerful ideas"? 
| think the top of the list is 
1. There are powerful ideas 


From here on, they come in many shapes and sizes. For example, one of the biggest is 


2. Human brains are not great for thinking, but can be taught how to think much 
better 


This is part of the fact that raw cleverness (IQ) can't get far without Knowledge, and 
Knowledge can't get far without Perspective/Context. Another way to look at this is 
that our main improvements have been in *method* — for example, the powerful idea: 


3. Science 
Another powerful idea is 
4. Look for powerful ideas: try to figure out what the phrase means ... 


Years ago Seymour Papert and | tried to make such a list, and had quite a bit of 
trouble organizing it. We concluded that while articulating them was good e.g. 


5. When something is complicated, find how to divide it into parts and try to 
understand the parts: rinse and repeat 


it was number 4 that was even more powerful ... 


and so forth ... (you can see why these are difficult to organize!) 


Alan Kay 
1 Still trying to learn how to think better - ly 


Where does Alan Kay draw the line between tinkering and inverse 
vandalism? 


| don’t think it matters where | draw the line, but having a real sense of "inverse 
vandalism” (making things just because you can) can be very important in an age 
where the industrial revolution can not just make in quantity, but can oversupply. 


One key idea is that we don't have curbs genetically built in for many things (from e.g. 
sugar to fantasies/stories, social interests, etc) because these were all naturally limited. 
But they turn into “legal drugs” or encumbrances when overproduced and over 
distributed. 


It's striking, and sometimes fun, to go to a large flea market (like Portobello in 
London) and muse that everything there was not just the product of tinkering, but 
they were once mass produced. 


| don't think tinkering is the main problem (it is very useful in so many ways). For 
tinkering, a big problem is to confuse it with engineering (they should work together). 
The larger problem is to avoid overproducing underthought things that can “redefine 
‘normal’ lower”. 


€ Alan Kay 
“7 Invented and designed a few user interfaces ... - ly 


What is considered an early version of a graphical user interface (GUI)? 
There were a number of excellent GUIs before Parc (most of which provided ideas we 
used at Parc). My favorites were Ivan Sutherland's Sketchpad system (ca 1962-3) and 
the RAND Corp's GRAIL (GRAphical Input Language) system). 


This research at RAND was part of the "DoD community”, and GRAIL was yet another 
ARPA project. 


GRAIL was a followon to the wonderful JOSS system at RAND, and a leap from a 
typewriter based interface to one that was completely done by gesture. A fairly true 
story is that the wastepaper baskets (and also notebooks) were examined at night to 
determine some of the processes used by RAND analysts. They found a lot of 
flowcharting and diagrams, and decided to try to do a computer based flowchart 
editor. 


To do GRAIL, a very good tablet had to be invented from scratch (Ellis: 1964 — the 
same year as the mouse), a terrific hand drawn symbol recognizer was done (Groner: 
1966), and the larger whole system (ca 1968). To make things more fun and difficult — 
but realistic — they decided to program the entire system in itself. 


There is a whole movie done by RAND, but it is worth looking at the excerpt I've used 
to show in talks, and then looking at the full movie. This is demoed by Tom Ellis, but | 
talk over him to point things out to the audience: 


The 2 minute Excerpt 


Next, the whole movie, which shows many more details, including the astounding fact 
that the flowchart programming ultimately rested on a foundation of IBM machine 
code for the huge 360/44 mainframe used for this. The tablet itself was handmade at 
RAND, and you could get one for about $18,000 1960s dollars. The display system was 
over $180K in 60s dollars. About 1/3rd of the single user mainframe was used for the 
gesture recognizer. 


The Movie 


This system was many years ahead of its time — to the point that something as good 
hasn't been done since! It was by far the most *intimate* feeling GUI I've ever used. 


Some of the YouTube comments today failed to understand just what this research 
was after — which was primarily “human-comptuter-symbiosis” and how could the 
computer be “inserted” as an amplifier of thought without feeling like a drag or 
distraction. l.e. can everything that is great about pencil and paper doing and thinking 
be simulated successfully by the computer (and can some useful magic be added to 
this)? 


So the flowchart part of this (and machine code part of this) was ancillary to the main 
goals. The larger intention was to have the symbolic language be more like JOSS (a 
higher level language), but then they thought it would be quite a demo if they did it in 
itself in IBM 360 machine code ... 


Part of the idea here was to see how far one could go in “connected smoothness” (and 
in the 60s). And some of this was very related to the work of Engelbart and his NLS 
group at SRI at the very same time. 


One of the connections was the problem of how to get text characters to be part of 
the visual world in a reasonable way. If you are working and thinking graphically and 
using pointing devices etc then it is annoying and distracting to have to go to a text 
keyboard to type. Engelbart solved this with a “chord keyboard” for the left hand (for 
typing commands and text without interrupting navigation, etc.) and GRAIL did this 
with character recognition. Both of these were slower than a regular keyboard, but 
faster when the need was just for a few characters. 


An important side note is that the Gabe Groner gesture recognizer for GRAIL actually 
pipelined the recognitions, so the human did not have to wait for the previous 
character to be recognized! 


The Engelbart system also provided a typewriter keyboard, which you could go to for 
larger paragraph-sized volumes of text. So hands were “out” (for navigation etc) or 
hands were “in” (for inputing larger volumes of text. This worked well. 


The Engelbart NLS Lap Control Setup 


Some of us thought that this would work for GRAIL even better (with the mouse being 
replaced by the stylus). We also thought that two-handed Uls were generally better 
for any system one was going to spend hours each day on. 


€ Alan Kay 
I'm the "Alan Kay" in question (try Google for the usual misinformation) - ly 


What does Alan Kay think independent researchers in computing should do 
for money if commercialization was part of the problem? What if "good 
funding" for research isn't available, but your ideas are worth pursuing? 

| don’t know. There is no lack of money, especially when one looks at what the 
computer part of Xerox Parc cost, or even what the ARPA-IPTO community cost. 


In my own experience, | and the other researchers within ARPA/ONR/Parc, etc knew 
we were in a wonderful situation. Our appreciation went up by a thousand fold after 
the great era ended in the early 80s, and commercialization of computing went into 
high gear. 


One of the reasons the ARPA etc community was so productive was the level of 
funding it gave to universities and students, and the students grew to become Pls and 
mentors in the community. This started to fall apart after the ARPA funding ceased, 
and the pretty top down NSF became the main funder (and of much smaller projects). 
I've always thought that ARPA's funding of large groups, often whole departments, in 
universities made a big difference (larger goals, less rivalries, no tenure pressure, etc.). 


| was able to get smaller funding that would allow some things to be done with small 
groups, but | never was able to find “critical mass funding” that would allow a big area 
to be explored deeply and then made practical. This got more and more difficult and 
painful over the years, and | finally ran out of energy around age 78 or so. 


A key point here — again — is “community”. I've always had roughly the same sized 
groups, but we were much more productive in the large at Parc because of the 
synergy with others. 


As an old fogey now, it’s worth pondering the roles of business, government, and 
universities. In theory — and often in fact in the past — the universities were the best 
places to do breakthrough work — in part because of students, and in part because 
universities are supposed to be primarily about ideas. A good source of funding is the 
government, in part because it is supposed to be about the progress of the whole 
society, and in part because it can and does do various forms of redistribution of 
wealth to benefit the society as a whole. Businesses as they are currently constituted 
generally have much too small goals — they setup is not psychically good for making 
real progress. 


If we look at these three, we can easily come up with great improvements not just on 
the current situation, but also in the past. For example, the “golden age funding” for 
computing came mostly from hot and cold wars, the government/citizenry has a hard 
time with “research for progress” per se. Universities became businesses from the 
baby boom, and essentially became more and more like trade schools and sellers of 
degrees. And businesses in the US form of capitalization and its markets have a hard 
time seeing that they must have higher aspirations than just making money. 


é Alan Kay 
-- Invented and designed a few user interfaces ... - ly 


Why wasn't the GUI by Xerox PARC or Xerox Star developed earlier? There 
was Sketchpad in 1963, which was a predecessor of a GUI, 20 years before. 


The other answers are pretty good. 


Let me first point out that the Alto at Parc with its bitmap display and overlapping 
window interface was done in 1973, which was only 10 years after Sketchpad, and for 
practical reasons required the first integrated circuit RAM (just 2 years after the Intel 
1103 appeared in 1971) instead of the more expensive cores. Still, the Alto was 
expensive in 1973 dollars (and really expensive in today’s dollars) but over the years 
we made almost 2000 of them to serve as a “time machine” so we could work in the 
future rather than the present or the past. 


Second, there were some truly excellent GUIs done in the 60s after Sketchpad and 
before bitmap displays, for example GRAIL at RAND Corp, which used a terrific tablet 
and gesture recognition to go along with great UI ideas. It had to use calligraphic CRT 
displays. Another system with many great ideas at the same time was NLS by the 
Engelbart group at SRI. Also a calligraphic display, but used video as a scanning 
conversion multiplying device, and had multiple panes, etc. 


Alan Kay 
<1 Still trying to learn how to think better - 1y 


How could Xerox come up with advanced innovation tech (i.e. GUI) since 
their managements or leaders know nothing? 

First, *some* of their management *did know things*. For example, the new Chief 
Scientist they got from Ford — Jack Goldman — was a first class physicist who also 
had a good feeling for what large companies needed. He urged Xerox to set up a 
Long Range Research Center away from corporate headquarters that would look at 
doing “critical inventions” in a wide range of sciences and technologies (including 
computing). 


This wound up with another physicist — George Pake — being recruited from Wash U 
(StL) — which happened to be an ARPA project — to run Parc. The ARPA connection 


found Bob Taylor to set up the computing research, and he had funded both the 
ARPAnet and a wide range of top young PhDs, whom he recruited starting in late 
1970. 


By early 1971 he already had a critical mass of some of the best computerists in the 
world. He was able to get them because Congress inadvertently took away much of 
the “nice funding” for ARPA computing, and Taylor was able to convince that a 
company could possibly fund as well. 


Part of the deal was that Xerox management was not to be in top down control of 
Parc — nor was Taylor — instead the top scientists “followed their instincts” and 
cooperated to do the “advanced inventions” (most were beyond "innovation"). 


€ Alan Kay 
© Still trying to learn how to think better - Updated 1y 


In the movie “Hidden Figures” is the depiction of the delivery of an IBM 
computer. Did the components actually not fit into the computer room as 
depicted in the movie, or is that “artistic licence”? 


My original answer (that the movie scene was “unlikely”) starts with the third 
paragraph. | then got interested in the movie and liked the characters and the acting, 
but didn’t think the writing and directing did justice to the historical characters or the 
science. Comments on the movie follow the original answer. Then | decided to get the 
book “Hidden Figures” (which | found to be much better than the movie (comments 
on this follow). And then | decided to get more historical documents, especially 
including the autobiography of Katherine Johnson. 


Bottom line: the larger subject matter, story, and people are really important and 
need to be widely told. More accuracy could have been achieved with more work, and 
would have made the portrayals even more important. 


Original Answer: | think the sequence mentioned in the question occurs around 0:41 
in the movie with the delivery of an IBM 7090 mainframe. The depiction seems quite 
unlikely to me for a number of reasons. 


The largest reason is that IBM historically used an army of support people for every 
aspect of their products. “Support” was their defining feature. 


So *everyone* handling the equipment especially at early stages would be IBM 
personnel, and there would be *lots* of them. And planning was IBM's middle name. 
Their whole deal was not making great computers but to be the most comfortable 
manufacturer to their clients. 


In those days you got *permanent* people with the computer after the installation 
who would take care of every form of maintenance, including fixing the machine and 
training both operators and programmers. (This was the case even with the tiny IBM 
1401 we used in the Air Force ca 1961.) 


Movie Comments 


Also, the later “we can’t get this machine running” sequences seem quite false to me. 
As do other sequences involving this machine. That just isn't the way IBM operated (I 
didn't like their architectures, but | admired their processes). 


Other scenes were just silly. For example, putting the oscilloscope probe on a different 
wire never seen before and getting a positive result (nope!). This is too bad because 
what these talented and skilled ladies were actually able to do was really good and 
very important — but the movies are a tough place to explain reality (same with the 
Turing movie, which never really touched what Turing did, how the Bombe was 
actually done, or what he was most famous for). 


In the very beginning of this movie there is the question “Isn't there anyone around 
here who knows analytic geometry?” This in front of the head engineer, who certainly 
does (as would all the other Langley mathematicians, engineers and scientists involved 
(of which there were already hundreds)). This was worrisome to see. 


Another side comment is that the 7090 was a ca 1960 transisterized version of the 
physically even larger vacuum tube 709 mainframe (which they would already likely 
have — they actually had a 704 in 1957 (see below)), so adding in a new mainframe 
and dealing with it would be nothing new. Similarly, FORTRAN was launched in 1956, 
and ran on most IBM computers — both large and small — including the predecessor 
704, etc. 


Poking around to try to find a history of computing machines at NACA, | found this 
picture of an IBM 704 (even more a predecessor of the 7090) at NACA in 1957: 


Note, there are both a male and female operator. The book reveals that NACA 
installed an IBM 650 (a rather fun drum oriented vacuum tube computer) in the 
mid-50s and started to calculate trajectories etc with it. The manual they would have 
used would look like this: 


The book also reveals that the 7090 in the movie was actually one of two that went to 
Goddard Space Center in DC, and the 704 above was the computer at NACA (and 
there was no 709 in between). Also, the tape drives didn't look right in the movie. IBM 
would have been using 729 drives in 1961 (too much of a nit). 


Another side comment is that the trajectory for Alan Shepard's flight 
TYPSCAL FLIGHT PLAN 


has to get the capsule downrange in a small enough area to be picked up by the 
Navy. It's worth noting that the V2 rocket in WWII traveled about the same distance 
(about 235 miles to London) and as accurately (it generally targeted about a 10x10 
mile area). This still had to be calculated for Project Mercury, but it was not a new 
problem — and the path was not actually a parabola (see below). 


About race and gender 


The movie is set in 1961 in Virginia — the very same time period | was in the US Air 
Force stationed in Waco, and then Randolph AFB, San Antonio Texas. My best friend 
was the best programmer at Randolph, and was a supersmart black guy from New 
Orleans. We once got refused any service in a bar in San Antonio because he was 
black — | was from the Northeast so it took me a while to understand what was going 
on — but inside of Randolph there was no hint of any prejudice. 


Nor were there any hints of prejudice towards women: my boss was a woman, and an 
extremely talented programmer, and many of the programming bosses at Randolph 
were also women. 


However, all the women programmers on the base were civilians, who were mixed in 
with (fewer) male civilians. So we could guess that the Air Force was lagging in how its 
military women were used (in looking this up, | found that women were in a separate 
organisation called the Women’s Air Force — which was “kind of part of" the Air Force 
— until 1973, when this separation was dissolved). 


| think the same situation was pretty much the same within government 
science/technology institutions at the time. Some of the best SAGE (air defense) 
programmers in the 50s and early 60s were women — Margaret Hamilton was one of 
them: she soon went on to write software for the Apollo Guidance Computer at MIT, 
and quickly wound up as the head of all the operational software for the AGC (and 
eventually wound up winning many prizes for her SW that saved the first moon 
landing mission). She recounts “some, but not a crushing amount, of male resistance”. 


So | can believe that the “out in Virginia” sequences in this movie still had lots of 
prejudice against blacks in 1961, but the “inside NASA” sequences with both black 
prejudice and gender prejudice don't ring true to me from my Air Force experiences. 
What counted back then, in my direct experiences, was ability. If you could show it in 
the Air Force, it didn't matter whether you were black or a woman or both. 


| have a pretty strong feeling that the same was generally true within NASA (for 
example, | really doubt that NASA had segregated restrooms whether in the south or 
not) — the Air Force at the same time (1961) definitely did not have segregated 
anything, even in Texas). Of course, NASA was civilian, and there could have been 
differences. 


| felt that the movie had some important things to say, but didn't do a strong job with 
them. For example, the more true to life contrasts between the prejudiced outside 
world, and the much more equitable inside world could have made great realistic 
drama and impact. 


On the other hand, the principal actors are able to transcend the material they were 
given with high quality performances that most of the time portrayed “higher points” 
and rendered the many flaws in the writing and directing irrelevant. 


Another answer (by Donjanelle Robinson) provides additional notes of errors and 
facts: 


What are the historical inaccuracies in the movie "Hidden 
Figures"? 


& https://www.quora.com/What-are-the-historical-inaccuracies-in-the-m... 


The Book “Hidden Figures” 


The book is many levels of quality higher than the movie, not just in historical facts, 
but in quality of writing and in depth of coverage. It's worth realizing that a movie 
script is not just limited on each very spare page, but also in length (they average 
about 1 page per minute of film, so: the script would be about 126 pages). 


By contrast the book is well over 300 pages with considerable background 
information, and with its main story starting much earlier in WWII with the expansion 
of NACA‘s (not yet NASA) pool of mathematicians and “computers” (human beings, 
most often women, with mathematical backgrounds who used mechanical calculators 
as prime aids) — to include black women. 


The complexity of segregation in each era is well told in the book. In many ways, this 
is the center context of the book. For example, the state of Virginia had many 
segregation laws that were the opposite of what was supposed to be the case for 
federal employees, and what resulted were tugs and pulls back and forth (in fact, for a 
variety of reasons, including Senator Harry Byrd, Virginia could easily be rated as one 
of the most backwards states regarding segregation). It's brought out in the book that 
many of the technical men were from the north rather than the south, and this turned 
out to be a positive factor, both for women and blacks. 


Also, in real life there are often many people involved in events worth making a movie 
about. These become also “Hidden Figures” who are omitted in a movie version (and 

in this movie, having three distinct female protagonists also made it complicated even 
with its over simplifications). There are many more key figures in the book, and all are 


worthy of attention. 


| wouldn't omit any of the great number of facts in the book, but | found myself 
making little charts to help me keep more clear who was doing what, when, and 
where. | think | would have preferred longer sequences staying on single characters. 


| was sorry that the author didn't make more of an attempt to explain the nature of 
the work, for example what “analysis” means mathematically, and what “numerical 
analysis” means to continuous mathematics, and why having “computers” helps. This 
could be done without requiring the readers to understand much more than how 
difficult to handle math relationships can still be approximated numerically. There isn't 
much more explanation than the movie (but less of a feeling of “rabbits out of the 
hat"). 


For example, everyone can understand that a smooth curve of any shape can be 
approximated very highly by a sequence of connected short lines that are as close to 
the curve as possible (to get closer you make the line lengths shorter). This idea was 
used by Newton in the 1600s, and a nice way to do it was found by Euler (this is 
“Euler's Method” which is not explained) — a big deal here is that quite a bit of all the 
computations done at Langley would have treated difficult problems by 
approximating them with line segments and solving for each line (this might require 
several thousands of calculations). For example, Seymour Papert used a mechanical 
robot “turtle” with a pen in it to teach children about parts and wholes, analysis and 
synthesis, and forms of analytic geometry. Here's what 10 short line segments look 
like when hooked together into a polygon, and the second looks like a smooth circle 
curve but it is 360 short straight line segments. The LOGO program for this is repeat 
360 [forward 1 right 1]. 


10 sides 360 sides 
repeat 10 [forward 20 right 36] repeat 360 [forward 1 right 1) 


The differential equation for this idea (in English) is “change of curvature = 0". This 
means the curvature of a circle is constant — and we can also easily see this from the 
program, where the step and the turn at each step are constant. 


“Analysis” means “take apart” and we can see how all this works if we are trying to 
understand how a complex shape — like that of a wing — interacts with air at 
different speeds. 


You can use a board for a wing but the streamlining makes it much more efficient. 
Often the streamlining was achieved by watching smoke streams in a wind tunnel and 
then carving wood models more smoothly to eliminate turbulence. The final shape 
might be difficult to handle algebraically, but (second picture) we can approximate it 
with line segments. This is “taking apart” ("analyzing") the curve into separate simpler 
pieces. 


Leading 


edge Trailing 


edge 
A cambered airfoil 


The third picture has the arrowheads removed, and the fourth picture is with many 
more straight line segments. Each one of which can be treated with easier 
mathematics (still lots of things to worry about) separately. The separate computations 
of each little segment of a complex curve, was a large part of the work of the human 
computers. 


This is both why “math can be lots of fun” and “made very understandable”! 


The same process was done for the flight paths of the capsules. For example, the Alan 
Shepard flight was not actually a parabola, but a much more complex curve because 
of the various resistances of air during the climb out and return (similar calculations 
had been done by hand for artillery shells for many decades, and one of the main jobs 
of the original ENIAC was to calculate trajectories). This was a more complex 
calculation for space vehicles because they wanted them to be as streamlined as 
possible going up, and as blunt as possible coming back. It's the green more complex 
curve below that would be approximated with line segments. Note: that each line 
segment is a hint for what the attitude and speed of the spacecraft should be at that 
time and place. 


no atmosphere atmosphere 


| also wish the book early on had not stooped to the fake mystique of “faster than the 
speed of sound”. Most bullets go faster than sound, and more interestingly, in the 40s, 
the V2 rocket went much faster than the speed of sound and was not at all torn apart. 


The book is at its best in providing a large scale picture of many dimensions of 
prejudice, segregation, tolerance, determination, etc., in NACA which morphed into 
NASA after Sputnik in 1957. This portrayal was done enough better than the movie to 
make up for its lack of explanation of what the women computers were doing (and 
also to get a more accurate timeline on the many events). The book shows that a lot 
of clean up changes had already happened by the time the events in the movie 
started. 


All this got me to look at more direct sources, including an autobiography Katherine 
Johnson wrote for junior level readers "Reaching For The Moon". 


Katherine Johnson autobiography: “Reaching For The Moon” 


This book is pure gold on so many levels, and | wish the movie (or a movie) had been 
made from this story. 


<more to come> 


This picture is of Melba Roy Mouton, Program Production Section Chief at NASA 
Goddard Space Flight Center @ in 1964 


€ Alan Kay 
{Still trying to learn how to think better - Updated 1y 
What is the significance of Butler Lampson's work? 


A good reply is much too large for a Quora sized reply. 


Basically: Butler is one of the most significant computer scientists in history — and 
has been one since he moved from physics to computing almost 60 years ago. He was 


always a “principal scientist, investigator, and thought leader” even as a grad student. 
He is a quintessential example of the word "brilliant" (and the term seems to have 
been coined for him). 


In the 60s while hearing a rapidly delivered perfectly formed Butler talk, a fellow grad 
student turned to me and said “You know, it's just a privilege to know someone like 
Butler”. When asked about “computer science” and “computer scientist” I've said 
“Look at Butler to see an example of a top computer scientist — it's hard to put into 
exact words, but he is one”. 


The list of his accomplishments is long, and so is the list of awards. Much of his 
significance has been in how he was able to both understand, invent, and influence 
the large world of systems of many kinds. These include a number of significant 
operating systems (including SDS-940 and Tenex), computers (including the Xerox 
Parc MAXC and Alto), networks (including the Ethernet and the Internet), and 
programming languages (including Mesa). 


He is both a great thinker and a great doer, and both of these plus his larger than life 
personality almost literally dragged the whole field forward, and made all the rest of 
us into better computer scientists. le. a kind of “magical person”. 


| will return here periodically with a little history and a few stories, but they are really 
commercials for readers to look at larger accounts of Butler's larger than life history. 


The citations and bio for his 1992 ACM Turing Award (the highest award in 
computing) are good to start with: 


Butler W Lampson 


Butler Lampson was born in Washington DC and educated at The... 


S https://amturing.acm.org/award_winners/lampson_1142421.cfm 


Next, the Wikipedia article — 


Butler Lampson - Wikipedia 
American computer scientist Butler W. Lampson , ForMemRS , (born... 


& https://en.wikipedia.org/wiki/Butler_Lampson 


Pretty much all of his more than 100 papers are not just well worth reading for 
content, but also to savor his crystal clear style. 


Microsoft researchers and engineers working around the 
world 


Get to know Microsoft researchers and engineers around the world who ar... 


& https://www.microsoft.com/en-us/research/people/blampson/publicati... 


Early Butler Projects 


Butler got an undergrad Physics degree at Harvard ca 1964, and traveled to UC 
Berkeley for a PhD. As he later put it, he “walked in the wrong door” and found 
students (one of them the legendary Peter Deutsch) debugging what would be called 
the "Project Genie” time-shared computer and OS sponsored by ARPA-IPTO. He 
immediately pitched in, and essentially never left this room. Soon, he was the main 
designer of the OS, a powerful “thought-force” for the HW modifications and whole 
project, and quickly wound up as an ARPA co-Pl. 


The resulting system was so successful that ARPA pushed SDS to commercialize it so 
that others could buy it (as the SDS-940). It was the computer used by Engelbart for 
“the mother of all demos” in 1968, and by TYMSHARE Corp to launch the first time- 
sharing business. And the design was recapped again for the next generation as 
“TENEX OS by BBN) for the DEC PDP-10, and again heavily used in the ARPA 
community. 


The design and approach to the design were both highly significant. The base 
machine was about 0.5MIP and 64K 24 bit words and a bit shaky (as so much of the 
HW was then), so the OS had to be both comprehensive, lightweight, and as fail safe 
as “lightweight” could be. | had started grad school in late 1966, and found the system 
to be a revelation of “sweet parsimony and balance”. 


Also, Butler's “hobby project” on this system was “CAL” an optimized version of JOSS, 
that constituted a “sweet spot” for interactive computing by the larger public. 


All this was essentially Butler's PhD project. Via his obvious brilliance, and in the midst 
of the ARPA-IPTO community he was let to run free by his advisors: supreme 
computerists Harry Huskey 


Harry Huskey - Wikipedia 
American computer design pioneer Harry Douglas Huskey (January 19, 191... 


& https://en.wikipedia.org/wiki/Harry_Huskey 


and Dave Evans 


David C. Evans - Wikipedia 


American computer scientist David Cannon Evans (February 24, 1924 -... 


©& https://en.wikipedia.org/wiki/David_C._Evans 


who a year or so later at Utah birthed modern 3D graphics (at Utah | was also lucky to 
have Dave as my PhD designer). 


A lot can be learned about design from looking at the Project Genie OS and 
subsequent work (hint: look at his succession of papers and commentaries). I'm trying 
to work on the “significance” part of the question, so I'll mention just two of the many 
interesting insights and features in this system. 


The first is that | think it was pretty much the first OS that “was designed to fail”, in 
that contrary to most code that was assumed to work (and didn’t), Butler knew that 
there would be both SW and HW bugs that would cause crashes (and this is really a 
bad thing for a time-shared system). So, a percentage of the already meager resources 
of the HW were used to both mark pages as “dirty” or “clean”, and to continuously try 
to write out clean pages to the secondary storage device. This meant (a) that a need 
for a written out page could almost always find clean storage in core to put it (without 
having to write out beforehand), and (b) that a crash generally had most of the end 
user's pages in a clean and written out state, and this along with other provisions 
almost always made crash recovery very swift. 


The second had to do with “training programmers” to pay more attention to taking 
care of their working set. The scheme had to do with the HW maps forcing a working 
set to be smaller than the physical core storage. This allowed the next user's working 
set to also be in core ready to go. The wrinkle was that a programmer could predict to 
the OS what pages would be in their next working set time quantum, and if true, the 
OS would reward by running on the fast queue. If the prediction failed (and an 
unpredicted page fetch was required) then the programmer was punished by being 
run on the much slower slow queue. 


Both of these — along with other special characteristics — allowed the 940s to have 
remarkably efficient performance. It could run several dozens of users with 0.5 MIP 
and 192Kbytes of RAM — and it was also able to be the workhorse for the Engelbart's 
group amazing “Mother of all demos” in 1968. 


Next 


If | were to write a next section here, | think I'd talk a bit about Butler's work and 
influence on design, safety, security, confinement, and meaning in systems. An 
important idea he emphasized from early on is “In computing, design rules change 
dramatically every few years”. A lot of this is from both Moore's Law scalings, and from 


the amount of learning that can happen in a few years. 


This means that trying to carry old ideas and techniques forward is fraught with 
disaster, and if done, has to be done very carefully. Butler followed his own “hints” and 
tried to do new designs that fit the new situations for each new system he worked on. 
Often this had to be done with less than ideal HW from manufacturers. With the SDS 
930 that became the 940, for Project Genie, they were able to make a new memory 
mapping system, and found very successful ways to get considerable power from 
what seems like meager resources. A User Machine In A Time-Sharing Environment @ 
covers some of the most important ideas and insights here. 


On the later CDC 6400 at Berkeley, they were pretty much restricted to using the 
existing HW while trying to come up with a radically new way to structure, protect and 
make an OS that used the idea of “capabilities” — a secure unforgeable reference 
privilege with limits — but without having HW support. 


On Reliable And Extendable Operating Systems (1969) % gives a clear account of the 
thinking and resultant designs of this unusual system design. 


<more to come> 


Alan Kay 
Have designed a few programming languages - Updated 1y 


How does a computer know what to do with binary code 01000001? How 
does the computer know this means A who told the computer? 

To get started thinking about this, it would be a good idea to not use the word 
“know". A better really simple analogy is how keys and locks work, where the lock is 
set up physically to accept or resist the carved patterns on keys. We generally would 
not say — except perhaps in poetry — that the lock "knew" the key. 


The so-called weak “Al” of today cannot be said to “know like a human", so we can 
leave out “know” here also. 


A good starting analogy here is how Morse code has been used in the telegraph for 
more than 150 years. Lots of methods were tried, but things worked best if the 
simplest signals were transmitted over the wires (long ago they picked long and short 
noises). Then they had to come up with patterns of these to stand for the letters and 
numbers used by humans. For example, the pattern for “A” in Morse Code is “short 
long" or “dot dash” (and written as ". -".). 


To send the letter “A”, the sending human taps out “short long” using the telegraph 
key — this sends a short and long impulse of electricity over possibly many miles to 
an electromagnet and “clicker” which turns the electric pulses back into sounds that 
the receiving human can hear. If this person knows that “short long” is supposed to 
mean "A", then they can write it down. 


On a computer you can use “Voltage and no Voltage” for the two codes. These are 
often written 0 and 1 (even though they are just “Voltage and no Voltage”. 


Imagine a panel with 8 switches. Now can we connect wires so that only the switches 
that correspond to 01000001 will turn the light on. How? We can use double throw 
switches in series: 


decide that an up switch will signify “O" and a down switch will signify "1", 


— 
— 
| a e . a * 2 = s 
4 = — — _— — — — 


. . e * * e 


and wire them so that: 


up=on dn=on up=on up=on up=on up=on up=on dn=on 


And this will only pass current to turn on the light when only the switches are thrown 
correctly (something very like this is used in computers for decoding a binary number 


into unary selections). Can you see how this is like a “lock” that can match to a 
particular “key”. 


Humans can then assign a pattern like this to stand for an “idea” (like “A’). 


Whats fun about the computer is the extent of things that can be done just by 
manipulating patterns. 


Most transistors in a computer act like switches that can be thrown off and on via 
receiving electrical signals (so computers are mostly organizations of “switchable 
switches”). 


At some point it should be possible to make something just out of patterns that 
behaves like a mind — even a simple animal mind — but this has turned out to be 
surprisingly difficult, and is still a ways off. 


© Desperately Seeking Answers 

Answered by Alan Kay - ly 

How does a computer work? Explain with an example. 

Originally Answered: How do | explain how a computer works in my own words? 

In the answer to the parent question, Jay Nabonne gives an excellent suggestion: to 


find ways to understand computers, then try to explain them yourself to others (which 
will generate your own words for this). 


So | think your question is mostly: "How do | understand how a computer works?” 


There are many answers on Quora already for this (including from me). When people 
ask me for something good to read to start off absolutely from scratch, | usually 
recommend “The Pattern In The Stone” by Danny Hillis. It is a small masterpiece. 


€ Alan Kay 
‘7 Have designed a few programming languages - Updated ly 


What does Alan Kay think about the harmony-oriented programming 
paradigm (https://dl.acm.org/doi/abs/10.1145/1449814.1449872)? 
Originally Answered: What does Alan Kay think about the harmony-oriented programming 
paradigm (https://dl.acm.org/ DOI/10.1145/1449814.1449872)? 

This URL results in a 404 error (the correct URL is in a comment). This is a very short 
speculative paper. In the same comment is a reference to a 2 page poster, and a 
Master's thesis by Kyle Martin - “Harmony Oriented Architecture” 
https://stars.library.ucf.edu/cgi/viewcontent.cgi?article=2765&context=etd [%, which 
has more content. 


In the spirit of “Yes, and ... ?" that is the center of improv and tentative thinking, | think 
what these papers are calling attention to — the difficulties of scaling code and 
systems, and the need for great improvements — should be greatly encouraged, 
experimented with, discussed, etc. 


| think the call for ways to have looser couplings being part of the larger scaling 
solutions is good. And some of the citations, especially in the MS thesis, point to 
praiseworthy places to start. For example, I've long been a fan of LINDA as a way of 
thinking — especially as a metaphor — for how to get started, especially to try to 
coordinate via descriptions rather than domain labels. 


| also like the goal of having the system work successfully in the preseence of noise 
and errors. 


The rub here is partly that we need to actually do implementations to vet ideas, and 
that feasibility studies in scaling are not easy to set up. For example, the Internet was 
very carefully thought through and simulated — and Parc even had its own real 
“internet” (the PUP network) — nonetheless, the actual making of the Internet still 
required more heuristics to be added to TCP/IP in order to stabilize the larger system. 


Similarly, the massive over-all-Internet replicated pseudotime system — Croquet — 
which had its birth in the PhD thesis of Dave Reed in 1978 — had to be implemented 
and tested and redesigned and implemented many times before it really started to 
scale well. 


And earlier, Smalltalk had 5 distinct designs and implementations over 10 years (the 
last 3 were of great practical use, and all were used extensively). 


This is partly a general principle, and partly a direct consequence of the combination 
of scaling and degrees of freedom, etc. 


For “Harmony, etc.” there is not enough practice experience for vetting that | can see. 


Its fans see possible good properties (and | think there are a few in there). But | also 
worry about the level of solution proposed. For an example of a nit, the rhetoric about 
“encapsulation” misses that an object can expose any part it wishes to, while still being 
able to protect the parts that shouldn't be exposed. 


| also feel that much too much of the discussion is trying to preserve “bad old ways" of 


doing things. | would feel more comfortable to see sample code in a new language 
made for making the new kinds of system proposed. 


However, | would definitely sign Kyle Martin's MS thesis because it is a good enough, 
wide enough discussion to count as an intellectual contribution to an ongoing 
discussion of very important issues. 


(A Alan Kay 


Have designed a few programming languages - ly 


What are some tips for a 16-year-old that wants to be a professor in 
computer science, specifically in type theory and programming language 
theory? 


I think the best way to be a professor in a subject is to first become a first class 
practitioner in the subject, and gradually learn how to explain it and help all levels of 
interested people to learn it. A good doer does not automatically become a good 
teacher — both parts require amazing amounts of work (and | think the "good 
teacher” part is the hardest). 


One of the signs of first class practitioners is that besides being very skilled, they can 
also deeply criticize any and every part of the subject. This is especially critical in 
computing because so far it is much too ad hoc. One way to think of this is that good 
criticism is opening the door to much better ideas (and at the least allows good 
warnings of “danger ahead" to be part of the thinking process). 


In order not to get buried in what people have done and are doing, | suggest using 
the question “why?” instead of “what?” or (worse) “how?” 


For example: why types? An early impetus was to help compilers generate code. E.g. 
computations with numbers don't semantically need to know *int* or *float*, and in 
many cases an algorithm will need both. Another possible use for typed variables is as 
documentation — this is using the variable name as a stand-in for an entity that will 
now mostly help human programmers. A worthwhile case to consider is what to do if 
someone sends us a module over the Internet that we want to make use of. What is its 
“type”? What are the “types” of its API. Etc. Right away we should see that we need 
dynamically adjusting systems/languages rather than the old-style static approaches 
to semantics. I.e. We need “semantic types” and the field doesn't have them yet. A 
stopgap is to have a dynamic language — it can be fixed so it can’t crash — that can 
dynamically adjust to both changes and unknowns. 


The above paragraph amounts to the start of really big criticisms of most parts of 
computing (much of which insists on living in the past, and especially using old ideas 
that don't scale). 


We can see that some form of “typing” can be really useful, but that none of the 
currently used approaches is very good (| think most forms don't really pay their way). 


Similarly, one should ask about programming languages: why C or C++? Why Haskell? 
Why Smalltalk? Etc. A good answer for any of these will beg for something 
qualitatively better ... 


A good exercise — which most in computing flunk — is to be able to really criticize 
the WWW and especially its browsers ... 


| was very lucky as a grad student to have been influenced by Bob Barton — the main 
inventor of the B5000 — an authentic genius. In his advanced systems course he 
handed out a list of things for us to “read and remember", and used the actual class 
time, as he said “... to disabuse you of any fondly held notions you might have 
brought into this classroom”. 


Basically, he destroyed the existing field for us (including some of the things | knew he 
liked). Those who survived this class were set free of “the tyrannies of the past and 
present” and were now open to inventing better futures (sometimes radically better 
futures). 


This could be tough today because even the “leading” CS departments do much too 
much training for jobs in existing business rather than actually helping students 
become real computer scientists and teaching then not just to think for themselves, 
but how to really think well. 


€ Alan Kay 
Still trying to learn how to think better - Updated ly 


Is it possible to have a computer without critical section? 


Somebody (a gremlin?) changed the question after my answer to include the word 
“multiprocessing” (which makes a different context). 


New answer is still “more or less yes", but with a few twists. For example, some of the 
earliest multiprocessing was done by “roll in and roll out" of entire memory images, so 
it wasn't possible for one process to write into another process. This could be tricky, 
because the nascent OS code was visible to every process. 


Original answer to original question: Is it possible to have a computer without 
the use of critical sections in software? 


Yes, and most programs didn't have them in the early days. 


The first programming job | had was in the Air Force in 1962, initially doing 
conversions from punched card machines to the IBM 1401. Interestingly (ironically?), 
both the multiple punched card machines *and* their plugboards were often run in 
parallel, whilst the tiny (8k 6 bit bytes) and slow 1401 generally didn’t have enough 
memory for any kind of concurrency. 


And this was also the case for the slightly larger (and also slow) Burroughs 220 in our 
shop. 


So there were no interleaved threads that could mysteriously clobber shared variables. 
And no critical sections to be protected. However, interrupts acted like transient 
threads and could clobber anything anywhere (in this case the entire program was a 
“critical section”, but some care would be taken ...). 


And, variables were still shared in single threads, and these could clobber a value in a 
variable that was expected in another section of code (this is STILL one of main causes 
of error even in relatively simple programs). 


When multiprocessing was first introduced, the idea of critical sections was thought 
up as a simple way to suspend threads looking at the same section of code to avoid 
races between threads clobbering shared values. This was initially a practical solution, 
but it scaled poorly. 


For serious multiprocessing, and especially time sharing, the parallel processes were 
hardware protected. This was unfortunately but pragmatically combined with MMU 
and swapping protections also in the HW. The combination did not scale well, and the 
HW manufacturers did not fix this (to this day). Then programmers reintroduced 
unprotected threads within the processes (a really bad idea also ... and to this day). 


Meanwhile, in the late 50s John McCarthy was specializing in “great ideas that ran 
poorly on contemporary machines". One of these was Lisp. Another was the idea of 
“fluents” (or "pseudotime”). The basic idea was not to let the CPU determine time but 
instead to simulate time along with everything else. This allowed a real definition of 
“simultaneous” and was a scalable (upwards) way to deal with and eliminate race 
conditions. (I've tried to explain this idea in a few Quora answers ...). 


Soapbox: computing has a strong tendency to hang onto an idea — often by patching 
— long after it has become obsolete — and also tends to avoid learning new ideas, even 
when tremendously useful. This usually results in dangerous bloated messes. 


€ Alan Kay 
“1 Still trying to learn how to think better - ly 


What is the origin of model-view-controller? 


Ivan Sutherland — the inventor-originator of interactive computer graphics — had the 
idea of a clipping window to see a part of the very large Sketchpad world. This wound 
up being a program structure roughly like a camera which had an aperture that could 
be of any size relative to the world, and would show that portion of the world on the 
screen. Each Sketchpad object had a method that would render the object in the 
world, and the windowing apparatus did the job of transforming world coordinates to 
window coordinates (this was done as part of the rendering operation). 


Ivan Sutherland in 1962 at the building-sized TX-2 computer working with 
Sketchpad on a truss bridge design with the screen showing part of the whole 
bridge that has been zoomed in on. He is holding a light pen used for pointing 
and graphical input. This year is the 60th anniversary of Sketchpad. 


The next version of Sketchpad was a 3D system (done by Timothy Johnson), and it 
supported 3D multiple views of the world. 


Ivan's next major project was the first virtual reality headmounted display, and this 
required tracking the head of the wearer, and using this to sample from the interior 
3D world for each eye. 


So there was a similar software structure like a telescope or camera that had one end 
in the virtual world, and had the other on the display face. Both the to be visualized 
objects and the “cameras” had six degrees of freedom. The virtual world side was 
sometimes called the “camera”, and sometimes called the “window”. The real world 
side was often called the “viewpoint” (at Xerox Parc, this was renamed the “window” to 
help explanations to Xerox). 


When | was a grad student at the U of Utah, | overlapped with Ivan, the HMD project, 
and its approach (meaning: there is a world with virtual objects, and these include 
camera objects whose purpose is to look at parts of the virtual world and send these 
to be displayed on windows on a display). The simple linear equation for coordinate 
transformations was now a many degrees of freedom matrix that required quite a bit 
of special HW support (that had to be built from scratch). 


At some point | started thinking about other than graphical virtual objects in the 
“everything is an object” ideas | was pursuing. The idea was that every object should 
know how to render itself, and this should be combined with a windowing-viewing 
mechanism. A fun thing was the idea to use multiple windows to show different 
aspects of the objects in the view. | first saw this in Sketchpad III, then in the Engelbart 
system in the late 60s (it was mostly used for filtering detail), and used the idea in the 
FLEX Machine that Ed Cheadle and | did. 


The earliest versions of Smalltalk were simple and slow, and included multiple 
windows and “turtles” that could be part of a window (that would then clip what the 
turtle drew). 


drop petpepeg. qeoqequyrpt: 
doo bhhpeg. GpeboM Porc 


cammet reat the fiery icticr ~ot 


rato in 4 @eaverng Vow 
n wid Gemwtalf, wot | cam. The 
peters are PIvish, of an anion © wie, ee 
» language 6 tha of Moetee whech 
net utter bere, But this in Oe 
aio Tongue | whm t 1 
prince gh 
ie Bina i fule wr a, One rie 
id them, ‘ 
Om Png w biwt dum al, amd in Ow 
darters tid item, 


A Smalltalk-72 screen showing an editable view of an Elvish character, which is 
used in one of the paragraphs in “the galley editor” (an early experiment in 
desktop publishing). The galley is a collection of views of both paragraphs of 
text and a painting view with a sketch of Frodo and Gandalf. The user interface 
for a painting view pops up around it, and it in turn is made from views of the 
controls. 


Another early plan — not done until later — was to have a slippage scheme, because 
the simulation “frames” from the model could often be computed much faster than 
graphical rendering of the view, and thus one would like to decouple the updating of 
the semantics from the updating of the display. This was later actually done in a 
principled way when MVC was implemented. 


Shortly after Smalltalk-76 was done, we were visited by Trygve Reenskaug from 
Norway, who was very interested in large scale planning (such as for the Norwegian 
ship building industry). He was indefatigable, and decided to do a real planning 
system in the new Smalltalk. | attribute the first formal and working examples of MVC 
to Trygve, who among other things used them to view and change a complex 
changing schedule and to have different windows show lists, Gannt and Pert charts, 
etc. each of which could be edited directly and would automatically cause the model 
to update itself. 


sisers in” at si’. 
[aie iA s4a ee 
io 7] ~~ a) 
fait tee sal ‘ , 
eLev1 pase 2 “ 
cil) Ents 
, “ 


emi saeTtAcsuUuW 
eet endAccwiv 


rw! 


t 


Trygve’s first planning system ca 1976-7 showing three different dynamic views 
of the same plan with the same item (A21) selected and editable in each view. 
This is arguably the first principled use of MVC. 


Adele Goldberg and Dave Robson got very interested in this and took it further to the 
general idea that there is a “model” system, which has the semantics of the system, a 


“viewing system” which can get the needed parts from a model for each view, and do 
the screen painting, and a “controller” system that deals with the input devices 
employed by the user, does the sometimes difficult inverse transformation on the view 
to find what is being interacted with, and informs the model about needed changes. 


This more formal system was put into Smalltalk at some point — maybe not until 
Smalltalk-80 — and both overkilled and underkilled. Part of the underkill was that 
originally there was a class "Model" with a protocol for things to be viewed by 
members of class “View”. But since you wanted to view everything, the model protocol 
should be part of class Object. Etc. This eventually happened. 


And so forth. For the “simple things should be simple” part, there needed to be 
default views that would be automatically included in any class definition. One of 
these later that was much used—Morphic—was done by John Maloney — originally 
for Self — and then for Smalltalk. Similarly, having default controllers was very useful. 


The whole framework was a very useful way to separate concerns (and | think even 
more should be done pragmatically with the details of creating MVC subsystems). 


€ Alan Kay 
1 Still trying to learn how to think better - ly 


What was the output of the computer when there was no screen and 
speaker? 


My vote for the first really working programmable computer is the EDSAC by Maurice 
Wilkes. One of its intrinsic output devices was three CRT display tubes that looked at 
various portions of the machine, including the memory. 


Using bits in memory to show a tic-tac-toe board. 


So: the first real computer did have displays — and used standard paper tape inputs 
and printer (from earlier teletype technologies) 


€ Alan Kay 
“7 Still trying to learn how to think better - 1y 


Computer Scientists: if you had to recommend just a single book to try and 
teach someone computer science what would it be? 


My parallel to this question would be to replace “computer science” with “Physics”. 


60 plus years ago when | went to college, good answers would have been “Sears and 
Zemansky" or “Halliday and Resnick”. These were both thick tomes aimed at more 

than a year of first year Physics, and covered most important aspects of the field in a 
first encounter form (but with enough math depth to be like more advanced courses. 


When | wrote this answer, there were 2 other answers and a comment. The 
recommendations were two books about programming, and one, more general, but 
without a needed next level of detail (the Hillis book “The Pattern in the Stone’). 


| like all three of these books, but computer science is not the same thing as 
programming, so | recommend “The Pattern in the Stone” quite a bit as an intro to 
“computing” — it's a terrific book — but it doesn't have needed details, etc. 


Nowadays, there is quite a bit more confusion about what “computer science” actually 
should mean. Most students I've asked over the last 25 years or so, have given an 
engineering — rather than science — definition. In the 60s, the first Turing Award 
winner — Al Perlis — said “It's the science of processes: all processes”. 


Science is trying to make theoretical representations that mirror phenomena and their 
possible causes. The phenomena could be generated by nature or via constructed 
artifacts (like bridges and dams, etc). In the latter cases, we have both theories that 
guide the constructions and the usefulness of trying to find better theories for the 
artifacts. In the case of programming, we are essentially getting a computer to run a 
theory to produce phenomena. 


And so forth. I'd love to see a single book that has that level of sweep! And there 
might be one that | don't know about. Right now, | think it would require cherry 
picking for topics from a variety of separate books and papers combined with a 
particular pretty unified conception of what “computer science” today should be 
about. 


The lack of this makes me think that we don't quite have a real field yet, in the sense 
of "science field". But | do think that you could do a lot by fleshing out Hillis’ approach 
to encompass Perlis’ early posit. For example, you could look at an amalgam of 
approaches like that of Nissan & Schocken, and of SICP, and general systems (and 
these can be simplified without removing depth, etc.). 


Gé Alan Kay 

“1 Have designed a few programming languages - ly 

Can they discover a way for programming digital devices without 
programming languages? If yes, like what? 

Punched card machines, e.g. from IBM were digital devices that were programmed 


with plug board wiring, not programming languages. Some of them were quite 
capable. 


| did a little of this in the early 60s before learning the 1401 in the Air Force ... 


€ Alan Kay 


Have designed a few programming languages - ly 


The dominant paradigm is modern programming languages is Object- 
Oriented. Why should developers abandon this paradigm and adopt Golang 
if their application doesn't have huge number of concurrent IO requests? 

As | tried to point out in a previous answer, | don't think that “object-oriented” is a 
“programming paradigm”; instead it is a way to define computational things at will 
(because the building blocks are semantically whole computers). So it is basically a 
meta-framework. 


I think a “paradigm” for programming should be both a style and somewhat 
prescriptive as to how to go about using the basic materials to make things. 


For example, “object-oriented” allows things like *data structures and procedures* to 
be defined, or the kind of *functions* that “functional programmers” like, or highly 
modular systems that integrate separations of concerns, and so forth, including *old 
style Al* expert systems with complex relational ontologies, etc. Some of these scale 
well, and most have great difficulties. 


| always judge a programming language by how “meta” it can be when needed. What 
is required when “Paradigm X" is required to best program a new system? One hint: is 
to ask to what extent and practicality can new features be made part of the 
programming language being used? 


This requires the ability to define and install new features — with regard to syntax, 
semantics, and pragmatics — at any level of a language (including its own building 
blocks). 


This has been experimented with over the years by various top researchers (one that 
was pretty nice is described in the first 100 pages of “The Art of the Metaobject 
Protocol” — this is not the only way this could be done, but it is very suggestive of 
good pathways). TaotMOP was done almost 40 years ago, so there’s a good chance 
that something better has appeared since, and | just am not aware of it. 


The key idea is nicely independent of the exact methods used, so there is more leeway 
than most people think. | think this whole question is a difficult area for the 
computing of today to contemplate without feeling overwhelmed, because it is so 
design intensive, and today’s general approach is very “agglutinative” with less 
forethought and overview than most situations actually need. 


€ Alan Kay 
“= Still trying to learn how to think better - ly 


What does a computer without an operating system look like? 

Like many computers of the late 40s, the 50s, and into the 60s. They were “bare” HW 
plus a few external utility programs, for example, the IBM 1401, or early DEC 
computers. 


Sometimes one would write what today might be called a “mini-operating system” to 
aid work flow. For example, we had an 8K 1401 in the Air Force, and were able to put a 
tiny batch job “OS" (in the upper thousand characters of memory) that would 
automatically deal with running batch jobs, telling the operators what tapes to mount, 
etc. 


We also had a Burroughs B220 in which a data tape had as its first segments, code 
that understood the format of the tape and “object style” had a table of operations for 
the semantics and pragmatics of the rest of the tape. 


In any case, the notion of an "OS" was early in one respect (as code to provide a better 
programming environment than the bare hardware did). In many examples in the early 
days these were supplied by macros from a standard library. 


The ca 1960 desire to share resources for multiprocessing and time-sharing forced 
development of something more like modern operating systems on one hand and 
memory protection and mapping on the other. The latter lagged the former (for 
example, CTSS on the 709 at MIT rolled in and rolled out whole core images (it had no 
map). IBM was very late on most of this. 


On the other hand the early Burroughs B5000 had a quite elegant fine grain memory 
allocation and control scheme, and early in the 60s, not only had a resource shared OS 
but multiple physical processors, all controlled by the first OS written in a higher level 
language. 


Alan Kay 
Have designed a few programming languages - ly 


How did Ada start programming when computers were not invented? 


You don't need a computer to write a program, or to execute it. For example, an early 
algorithm that was done by hand for many years was the greatest common divider 
(one of the earliest is attributed to Euclid). 


One of Ada’s programs was how one would compute Bernoulli numbers on Babbage’s 
proposed Analytical Engine. The Engine was never built, but she probably executed at 
least parts of it by hand (and also left a few bugs). Take a look at 


What Did Ada Lovelace's Program Actually Do? 
In 1843, Ada Lovelace published the first nontrivial program. How did it... 


& https://twobithistory.org/2018/08/18/ada-lovelace-note-g.html 


to get more of a feel for this program. 


A side note here is that in the early days of computing, most programmers did most 
of their debugging at their desks, via hand execution (I certainly did when | started in 
the early 60s). 


€ Alan Kay 
Still trying to learn how to think better - Updated 1y 


What guidelines would Alan Kay recommend to model something 
confusing? 


First, is the area “intrinsically confusing” or can the modeling clear things up? 


| think of most things that have a “systems nature” as being the former, so | like the 
idea of “multiple covering heuristics” that together can handle the errors etc of the 
whole — a simple great example is TCP/IP, whose combined half dozen or more 
methods operate and “stabilize enough” the whole Internet. 


In more semantic areas, more semantics — especially that allow reflection — are 
usually what is needed to provide “multiple covering heuristics”. 


| think most people who think of what they are doing as “making and modifying 
systems” will also approach things from the “error detection and correction” 
standpoint. 


My past research groups got a lot of mileage by building a combination of powerful 
simple languages combined with the most powerful debugger we could come up 
with. 


I think that — today — most of this needs to be done by a “coherent goal system” 
itself, and that what's needed is a second or third order organization of elements that 
can deal with itself. For example, Facebook has many hundreds of system integrity 
tests, but they were not up to knowing the goals of the system (for example that it 
needed to stay on the Internet!) and this led to at least one very large crash of 


everything. 


The lack of semantic modeling and reasoning etc — especially of overview goals that 
are dynamically involved with software systems — is one of the most glaring lacks in 
software today (and something that was more or less abandoned 30 or more years 
ago). 


Added April 2nd: 


Ken Kahn directed my attention to the excellent, very recent, paper by Doug Lenat: 
Creating a 30-Million-Rule System: MCC and Cycorp Creating a 30-Million-Rule 
System: MCC and Cycorp @ 


This is an interesting example of how to think about and deal with "modeling 
something confusing” i.e. human common sense reasoning — and it is also an 
example of the kind of system that should be behind the programming systems of the 
future. 


a Declarative Programming 


‘Answered by Alan Kay - ly 


What paradigm is the successor to OOP? 

“OOP" (as | think of it at least) is not really a programming paradigm, but is much 
more a workable “universal definition” scheme, and that is especially suited for 
defining large systems. 


It is basically an abstraction of an unlimited number of complete computers 
networked together (and where — by definition — the interior of the computers can 
also be a system of computers (both real and virtual). If you only worry about 
semantics, this provides a very powerful universal building block at all scales. 


Metaphorically, this can be thought of as a “universality” like that of NAND or NOR, 
vastly scaled up: you can build any kind of functionality you want, but there is no hint 
of how to design the organization of universals. 


In practical terms, since you can imitate (simulate) any idea, you could choose to use a 
real OOP framework to simulate old familiar ideas — like data, procedures, etc. — or 
you could choose to use the framework to deal with vast scalings and new situations 
brought by Moore's Law. 


Historically — and unfortunately — “OOP” starting in the 80s has generally chosen to 
simulate old familiar kinds of things (via its subset use as Abstract Data Types). For 
many reasons, this kills “graceful scaling” (and has done so)). 


So — for general/commercial use — "OOP" needed to be packaged not as a 
programming language — too many degrees of freedom for most programmers — 
but as a framework loaded with powerful design schema to help programmers learn 
ideas far beyond mere programming. That didn't happen. 


One way to think about “declarative programming" is via an analogy to a system of 
“simultaneous equations”. If there is a solver that can solve them, then it is extremely 
handy to just add a new equation for each new situation, and let the solver find a 
viable solution for all together. 


Note that a system of simultaneous equations, quickly gets difficult to gist — and 
some systems — even of linear equations — don't have solutions. And many systems 
of equations don’t accurately describe the desired system. 


Eventually, this will give rise to a higher way to think about this e.g. matrix algebra. 
(But still, how can this be grounded in more *meaning* that transitory goals in the 
minds of programmers and management?) 


Declarative programming is all this and much more (an insurance system I’m aware of 
has over 100,000 requirements — and the working system is a partially unknown 
approximation to those requirements). 


To me, all this (and more) implies that a next real paradigm (in the deep sense of the 
term) would be “knowledge based system building” of various kinds. 


| think the minimum would be to have the system pretty much understand the goals 
of any new system added to it, so it can do a lot of the feasibility checking (and much 
deeper) that humans sporadically and randomly do today. 


Such an active knowledge context, for example, would have prevented the recent 
Facebook crash (in knowledge terms, Facebook's code didn’t even know that it was 
hooked up to the Internet and that it didn't want to unhook, etc.) 


Making a minimum knowledge context — and languages that could be in terms of it 
— is a monumental amount of work and invention (not unlike the amount of work 
and invention required to come up with our current wave/paradigms of computing). 


| don't see much taste for this amount of work and invention in the field today, nor 
any inclination from funders to try to do a restart of the field. 


Paradigms in Physics are large qualitatively separated and different whole points of 
view (like Newtonian Physics vs. Quantum Physics). If we take that to heart with our 
field, then we should be looking for something revolutionary (and likely: large). 


€ Alan Kay 


Still trying to learn how to think better - Updated 1y 


What does Alan Kay think about Douglas Lenat's Cyc? 


CYC is such a large complex and long lasting effort, that it doesn’t fit within the scope 
of a Quora answer. But as with most artifacts in computing, commentary can be 
somewhat separated into the *idea* of a system, and the processes to *implement* 
the system. 


The original idea that led to CYC was articulated by Doug in the early 80s as: “Instead 
of automating the words in an encyclopedia, we should automate the ‘white space’ ” 
— that is, we should automate what a reader of an encyclopedia needs to know in 
order to read an article in an encyclopedia, i.e. all the commonsense knowledge that is 
used as context. And let's do it as an “engineering project”, for example, can we (by 
brute force if needed) create an artifact that can respond really flexibly (rather than 
brittley) to many different kinds of contexts, etc. 


Doug likened the expert systems of 40 years ago as brittle islands, and what was 
needed was a “commonsense canoe” as a universal fall back process to navigate 
where there was little or no expertise. 


This is a great insight and idea. It is pretty much lacking in the so-called “Al” systems 
of today (and it needs much much more than ML type correlations in order to be 
realized). 


An enormous amount of work went into a number of CYC systems, which wound up 
with millions of relationships to try to model commonsense reasoning. These went 
deeper than any other artifact that I'm aware of, and had many important successes. 
This work had to struggle in the “Al winter” as ML drew interest and funding away 
from the deep requirements of “real Al”. 


| should be clear here that the CYC symbolic approach is not antithetical to ML: both 
are needed. The problem today is that the symbolic approach has been omitted or is 
too weak to do its deep part of the job. In today’s terms — now that Kahneman wrote 
his book on Thinking — what is lacking is the deep slow thinking in terms of models 
of the world, rather than the fast correlations and reactions of the Type 1 quick neural 
apparatus. 


A bottom line here is that CYC is well worthy of study, both for what it can do that 
other systems can't, and also to learn more about how such a system should be 
organized for a next pass. 


In any case, we should be grateful that Doug Lenat has been a bulldog on this project. 
Without his persistence, we would have no artifact “good enough to be criticized”. 


Ken Kahn provided a very useful comment below that I'm copying in here: 


This led to me to find and read the excellent, very recent, paper by Lenat: Creating a 
30-Million-Rule System: MCC and Cycorp Creating a 30-Million-Rule System: 
MCC and Cycorp 7% 


And the paper ends with a nice example of how ML and Cyc can work together as 
Alan pointed out. 


One minor quibble is the discussion of expert systems which while it was popular at 
Stanford, a few other places, and some startups Lenat describes, but it was NOT 
popular at the MIT Al Lab, Yale Al, or some parts of CMU Al. 


Alan Kay 
7 Still trying to learn how to think better - ly 


Where does Alan Kay see computing in both science and industry going, in 
terms of systems and how we think about them? 


I've been much more effective at doing things than in armchair predicting, so | don’t 
know what to say here. 


Over the last several years I've been asked to do a number of talks on “Software 
Engineering” (they are on YouTube), and each talk took on different aspects and 
problems, especially in industry (which is so large as to defy easy generalizations). Still, 
the latter parts | did look at, seemed pretty far off from much better much more 
scalable approaches, and their influence on university “computer science” courses 
mostly disastrous. 


As to “systems”, | think there has been very short shrift all around — in many ways, the 
past of both CS and SE have drowned out both good ideas in systems research and 
also prevented needed inventions. 


To just pick one area, “graceful scaling” — a proper area of systems research — is 
pretty much not found at all in industry. “Systems with integrity” — in a wide variety of 
meanings for this phrase — are essentially non-existent — witness recent meltdowns 
e.g. Facebook's. 


Systems meanings and reformulations etc is another big area that should be taken up 
in the large. 


So, the most | can say is that | don’t feel good about current practice — quite queasy 
actually — but this not much of a “prediction”. 


The initial answer to this is given above. But | just realized that “science” in the 

question might mean “Science” rather than CS. The former is about negotiation 
between teased out phenomena and the models we make to try to explain the 
phenomena — the former is “external” and the latter “between our ears”. 


As Judea Pearl has pointed out, ML is pretty much large scale “curve fitting” and that 
“correlation is not causality”. But the kind of loose coupling and the size of system and 
training sets today has allowed some progress to be made in previously vague areas 
such as visual and audio representations, and with various large data sets. 


Unfortunately, this has quite overshot where it is actually useful, to become a blinder 
more dangerous way to try to compute in general. 


Correlation is good for drawing attention, but it would be nice to see computing catch 


up with real causality and causal thinking and reasoning. This also requires a kind of 
loose coupling, but one much harder to set up and keep sane because of the much 
more stringent modeling that is being done. 


In theory — looking at our present from the past — today’s and tomorrow's 
computing could be really usefully put to work to start creating and exercising real 
semantic models of the complexities of our world — and to be able to tell us and 
teach us about itself. 


This fits into another large facet of "systems research” and it is a good one. 


€ Alan Kay 
Still trying to learn how to think better - ly 


What's the point of life? 


The point of life is to live life with a point 


€ Alan Kay 


| am the Alan Kay in question. - ly 
Has Alan Kay voiced any opinions about cryptoeconomics, blockchains, and 
related "crypto" ideas currently gaining a lot of exposure? 


| haven't. | feel | need more supported opinions — right now | have “feelings” 


é Alan Kay 
<1 Still trying to learn how to think better : 1y 
Is it true that the first computer was a loom? How is a loom a computer? 


Here is an answer that specifically covers your question about looms: How was the 
first computer operating system made if there were no programming languages? 


G Alan Kay 

<1 Still trying to learn how to think better « ly 

How and when did digital computers come into existence? 

This question needs to be crisped up a bit. “Digital” happened before computing. And 


many computers were not programmable. Computing was done long before 
computing machinery was made and worked. 


Etc. 


Lots of people realized it was possible, and wanted it. | tend to give the prizes to the 
first all around “everything” with both hardware and software, etc. | like three groups 
of pioneers: Konrad Zuse in Germany during WWII, Maurice Wilkes at Cambridge, and 
Williams et al at Manchester, with the latter two groups setting up the earliest working 
foundations after the war. 


€ Alan Kay 

“<1 Have designed a few programming languages : 1y 

What do you think about the idea of writing at least parts of law (some 
legal acts) in a technical language that would actually be a programming 
language? (So "computable law") 


| got asked to give a keynote last year at the Stanford FutureLaw conference on 
Computational Law (yes there is one). Start at 1:25 if needed. 


€ Alan Kay 


Have designed a few programming languages - ly 


All of the existing programming languages are based on English. And some 
are far more problematic, as seen by the code poem "Black Perl." What 
would an inclusive, progressive programming language look like? 

Start by looking at “chimera” artificial languages, such as Esperanto or Interlingua. 
Personally, | don’t think this helps much if at all for programming. 


€ Alan Kay 


| am the Alan Kay in question. - ly 


What does Alan Kay think are the most interesting people, projects, and 
organizations working on education today? 


I'm not comprehensive enough to give a good answer. But the most interesting 


computer related one is definitely Bret Victor's “Dynamicland” (check it out on 


YouTube). 


€ Alan Kay 
h 1 Have designed a few programming languages - ly 


Why did the designers of Smalltalk decide not to have traditional operator 
precedence for basic arithmetic? Did this decision lead to a more efficient 
interpreter? 


For the same reason as Ken Iverson chose for APL. When you have an operator rich 
expression language, it’s better to have one rule than 100. Lisp did it the other way: by 
explicitly showing the bracketing of scope. 


Alan Kay 

© 1 Still trying to learn how to think better - Updated 1y 

Why did Dijkstra say that “Object-oriented programming is an 
exceptionally bad idea which could only have originated in California.”? 


When | heard this, | thought it was really funny for a number of reasons — Edsger and 
| were friendly (via Bob Barton) and he loved to come up with snide funny comments. 


However, he certainly knew Nygaard and Dahl (of Simula fame) and that they were 
Norwegian. 


He possibly knew that the earlier Sketchpad had been invented in Massachusetts. 
He knew — via Barton — that | had my ideas while in Utah. Etc. 


Perhaps more interesting is that he was the inventor of the Semaphore coordination 
mechanism (ca ‘62), and a semaphore is an object that is an instance of an idea about 
synchronization, each of which has a local protected variable and only two protected 
operations. An irony is this was a bit ugly to make “neatly” in the Algol of the day, and 
in Simula (the encapsulation could be violated). When added to a HLL, it was usually 
done so as a “feature” (a special kind of variable rather than to think through the 
larger consequences of being able to make many such useful things). (And Hoare’s 
“monitors” were also objects, etc.) 


The other fun side-note is that McCarthy had invented “fluents” a bit earlier, a much 
more useful idea than semaphores and monitors for avoiding race conditions, and one 
that was not well enough understood at the time to catch on. 


He did like to pull chains ... 


€ Alan Kay 
<1 Still trying to learn how to think better - ly 


How might computer science evolve to be worthy of winning a Field Medal 
or a Nobel Prize? 


Or vice versa? That would be interesting also. 


In any case, it is strongly rumored that Nobel didn’t like mathematicians, so for 
whatever reason there is no Nobel for math. 


And there is this to contemplate (which equates a variety of prizes that the Nobels 
don't cover): 


Heidelberg Laureate Forum 


Mathematics and computer science have evolved into a matter of utmost... 


& https://www.heidelberg-laureate-forum.org/ 


€ Alan Kay 
“<< Have designed a few programming languages - ly 


What are Dijkstra's major objectives for the development of structured 


programs? 

It's very difficult to do sufficient justice to Edsger. Besides being a foundational 
computer scientist — in the original strict senses of the term — he was also one of 
just a few critical gadflies for our whole field, and especially both CS and SE. 


One way to think about a great person is that they are ‘great and interesting” whether 
they are right or wrong, or in between. This is because it is their unique perspectives 
that really count: they help us see from directions we couldn't. 


Those whose tiny sense of self and outlook allow themselves to be insulted by the great 
gadflies completely miss what “great” and “gadfly” and “progress” are all about. 


One of the difficulties in dealing with the ideas of a great person is to be distracted by 
some of the “really good stuff" they are able to accomplish — which tends to be more 
rooted in an historical time — and to miss the “really great stuff” they are advocating 
— which tends to be for much larger durations. 


In other words, the bug is to mistake the artifact for the bigger underlying idea. This 
has happened with a number of greats, including Dijkstra, Bob Barton, Ivan 
Sutherland, Doug Engelbart, etc. 


One of Edsger's greatest lifelong interests was to find out how to design and write 
programs that really worked and were “bugfree” — including “knowingly and 
meaningfully” bugfree. 


Following the idea that a programmer *should* have enough of a goal in mind to 
both try to write a program to do something, *and* to be able to tell if it works as 
intended, what he wanted was to find ways to *get the goal out of the programmer's 
mind and into the code*. 


Most goals are predications, and computers present the twin problems of 
*representing* predications and *running* them. The systems that could do this really 
well — like Sketchpad — though incredible for any time — were not general purpose 
enough to do either comprehensibly enough. 


So the question is “how can you organize — and confine — what the computer does 
do to approximate “runnable math”. The vast degrees of freedom of computing are 
fabulously powerful: how to be able to use as many as possible really safely? 


Edsger came up with numerous approaches to this, many of them important. With 
reference to the question here, he was not looking for a religion or for “golden 
methods" (this was more the way Wirth thought about this). 


You can learn a lot about the way he did think about things by looking at his 
extremely early Algol 60 compiler, followed by his approach to the THE operating 
system for the EL X8 computer. 


One of the big ideas of the 40s — especially 50s — is that the notion of a 
mathematical function could be imitated very well, if done with care, and — even 
better — that the idea could be widened to provide a machine independent 
abstraction for an idea, and especially for programs that could accomplish goals. This 
created ways to create a design language and scheme with separated concerns for 
design, but provided ways to combine the concerns in higher level more goal oriented 
ways. 


This quickly gave rise to several different kinds of programming cultures: one pretty 
ad hoc, and one “like Dijkstra’s” i.e. an attempt to be as “scientific” and “engineering” 


as possible with respect to design and building. 


The “like Dijkstra” one had the disadvantage of not knowing how to “do everything 
right” but it did have many starts to deep and correct criticisms of the other culture — 
which pretty much did everything “wrong” and were so blinded by pragmatism they 
couldn't see it. 


| think these two extreme cultures are very prevalent today, and both are quite 
entrenched — from many reasons besides personality leanings — | think there is a lot 
of the “loss aversion” and “sunk cost fallacies” hurting progress on both sides as well. 


A way to end this too long answer is to look at Edsger’s quip that “programming is 
logical brinksmanship", and realize that the largest psychological difference between 
the two factions is that his strict computing and engineering approach wanted to 
avoid brinksmanship via more care and art. The term “structured programming” as 
used by him was larger than what it degenerated into. 


A companion quote from his contemporary Tony Hoare is: "Debugging is harder than 
programming, so don't use all of your cleverness writing the program!" It's a 
reasonable generalization to say that many of the computer scientists, software 
engineers, language designers back then were trying to find way to write programs 
that could be reasonably debugged. It's also fair to say that most programmers today 
code expecting their program to run and be easily debugged (this doesn’t work well). 


Another quote along these lines of Edsger’s that is quite telling 60 years later today is: 


At least in Holland, the intellectual level needed for systems design is in general 


grossly underestimated. | am more than ever convinced that this type of work is 
*just difficult*, and that any effort to do it with other than the best people is 
doomed to either failure or moderate success at enormous expenses. 


All the best systems designers I've known over the years have put a lot of effort into 
*confinement* — and have come up with numerous mechanisms both hardware and 
software — some of which are completely critical today. 


By in large, most programmers violate useful and needed confinements right and left 
regardless of the tools they have (some of which can really confine well). Or what is 
done is via extreme kludgery (naming 5 of these right now will help understanding of 
this note) 


The HW vendors especially recently have done very poorly dealing with and helping 
these issues. 


And — it would be great to have those who do realize the importance of 
confinements to devise much better tools that fit the needs and scalings of our time. 


This is already much too long, but | can’t resist telling "The Sphere Story”. One of the 
lab managers for a few years at Parc was quite a character. He had been the Chief 
Scientist at a large company, and while there decided that he would like a stainless 
steel sphere on his desk for a paper weight. He ordered a 4” one from the shop and 
eventually forgot about it. 10 months later it showed up with an invoice for $10,000 
(about $85K today). After screaming at them on the phone they informed him that he 
only said "4 inch diameter” with no tolerances, so .... 


One of the many reasons that Sketchpad worked so well is that lvan Sutherland was 
an engineer's engineer, and engineers work to tolerances. He decided to make 
Sketchpad worth within reasonable tolerances e.g. minimizing the least square error 
fits of the multiple constraints. This allowed very difficult non-linear multidimensional 
problems to be handled right off the bat. 


Compare this with the desire of mathematicians to have “not-true” mean “false” 
(disclosure: | have a degree in pure math). | remember being shocked by many things 
in Sketchpad — and this was one of them! 


And compare this with the amount of “noise-limiting and correction” in almost all 
other parts of computing. If you are “thinking system” you are going to have noise 
and will design the system to work in spite of the different kinds of noise present 
(including human beings!). It's that a very different kind of mathematical thinking has 
to be done besides “A is not not A". 


Note that this doesn't invalidate Edsger's large goals, but it shows how poorly the 
early conceptions of programming have scaled. This non-system view is present in 
most programming languages in heavy use today, and just doesn't match up to what 
is actually needed. 


G& Alan Kay 
<< Still trying to learn how to think better - 2y 
What were interviews like at Xerox PARC? 


In brief: in the first few years of Parc, Bob Taylor wanted “only top people” (meaning 
“really top people”), and wanted to eliminate rivalries as much as possible. 


So he set up a laborious process in which everyone had to interview a candidate, and 
then discuss, and take a vote that had to be unanimous for the candidate to make it. 
This took forever. 


Of course we all bitched about this, but Taylor's process really worked. Each new star 
was greeted with open armed warmth, and they immediately fit into the somewhat ad 
hoc non-managed research collaboration that Parc enjoyed. 


Alan Kay 
<< Still trying to learn how to think better - 2y 


Why is human communication being advanced by so many revolutionary 
and very popular technologies while human collaboration isn't? In order to 
communicate, one must first collaborate, right? Why so much help to 
"share" and (almost?) none to promise? 


There is a lot of excellent content related to this question already on Quora. 


It's worth pondering two built-in human drives — competition and cooperation — 
and how they have played out in various cultures over history. We are social, and both 
want to (and generally need to) live in groups to survive. This requires degrees of 
cooperation. But then we generally compete at most levels. 


I think the large claim that most advances in human civilization have come mostly 
from cooperation can be substantiated (and that various forms of competition are 
mostly not positive, and even when they seem to be, are of less worth than 


cooperation would have been). 


And | think that we have to consider “human communication” to be much more than 
getting signals of one kind or another from human to human. It's worth comparing 
the thought patterns of oral societies to those with writing and schooling: they are 
qualitatively different, and it is the latter that is associated with the building towards 
“civilization”. 


Most recent media technologies — from the telephone on to radio, TV, etc — have 
facilitated oral modes of thought, including the oral structuring of thought (e.g. chat 
and Twitter). l.e. the facilitation is of means that require little to no learning, and thus 
require no major changes in skills/internal-structurings. 


One of the large shifts in perspective about learning media in the 20th century is the 
realization that it is the *difficulty of learning* that requires large changes of 
skill/internal structurings, and if this is coupled with new powerful ways to represent 
ideas, large positive changes can be expected. These insights were famously 
associated with McLuhan, but many others contributed — Innis, Havelock, Ong, 
Eisenstein, Postman, etc. 


Engelbart’s large contributions to the world included the idea that a reasonable 
“augmented human” is *not* created by just giving a powerful tool to a person. 


Instead, he pointed out that (a) “augmentation” included most strongly the *non- 
optional requirement* for deep training and education to acquire deep skills in using 
the great new powers of technology and (b) that since most things are done by 
humans in groups, it is critical to also “augment the collective IQ of groups” with 
training/education to use powers with collaborative tools. 


One of the underlying ideas here is that available powers plus the impulse to compete 
generally results in poor to bad uses of the powers. Simplistic image: cave people with 
atomic weapons (which is pretty much what we have today). 


Another way to look at this issue is that deep training/education is part of what 
“developed cultures” require in many different areas — including science, engineering, 
etc. and in developed music and arts. They all require quite a bit of education and 
training to deal with. They turn into trivial and often ultimately dangerous pop culture 
expressions and weapons without the deep prep. 


So — to gesture at the question — | would say that “human communication” with 
respect to important discourse has been mostly retrograded by poor untrained uses 
of modern technologies, and that our stronger competitive impulses are 
overwhelming needs for cooperation. 


McLuhan pointed out in the early 60s that a networked world would produce a kind of 
“global village” (which he was not in favor of) where, among many results, would be 
that everyone would know what everyone else was doing. A disastrous consequence 
would be massive loss of identity, and fierce efforts — including war — to reclaim a 
feeling of it. One of the manifestations would be demands to *participate* but not 
coupled with interests to “cooperate”. 


é Alan Kay 
‘1 I'm the “Alan Kay" in question (try Google for the usual misinformation) - 2y 


Who is the person to whom the invention of the computer is really credited 
to? There are different views on this particular issue; Alan Turing, John Von 
Neumann, Charles Babbage? 


Goethe once wrote: “We should all share in the excitement of discovery, without vain 
attempts to claim priority”. | like this because it invites celebration of many ideas from 
many sources at many times. And it also still allows us to identify real “superheroes” 
whose efforts made significant qualitative leaps — these people have been critical for 
progress, and | think it is important to identify as many as we can. 


It's also important to keep in mind that “computation” created the paths for 
“computers”. You seem to be asking about the latter, but it's the former that is more 
important (and which includes the latter). The former has to do with representations 
and manipulating them, including algorithms and notations — so coevolved with the 
development of maths — and the latter has to do with making machines that can help 
do this (and many analog means for computing can also be found in deep antiquity). 


‘m guessing that by “computer” you also mean “stored program computer”, which is 
quite a subset, albeit an important one. 


For modern times ... It is possible — even likely — that Babbage and Ada had a good 
idea of the universal nature of being able to manipulate what they first thought of as 
representations for numbers. 


like to start with Charles Peirce in the latter 19th century, whose analysis of Boole’s 
book produced the universal logic operation of "NOR", and who said that you should 
be able to make a computing machine from these. 


During that period, there were many paths taken to represent and remember ideas via 


mechanisms (some of the earliest most interesting ones for the combination actions 

of pipe organs). An electronic (vacuum tube) digital memory — flip flop — was made 
by Eccles and Jordan in the early 20th century, and much experimentation was being 
done using relays for both calculation and remembering. 


| think Turing’s paper in the 1930s was quite significant — not for the math — but for 
the essentially engineering approach he took, and especially the demonstration that 
very little mechanism plus general memory for “marks” could do everything. This 
strongly suggested how practical people should go about making stored program 
computers, and all that happened subsequently used Turing’s model. (What is called a 
“von Neumann machine” today is really a rendering of a universal Turing machine.) 


However, the work of Konrad Zuse during WWII seems to have been mostly motivated 
by his own ideas — some of them more advanced than the rest of the world. And 
many secret, practical and large efforts were made at the Bletchley Park code breaking 
center during the same time. 


So: two things to keep in mind (a) there was a lot of experimentation going on 
regardless of Turing and (later) von Neumann — for example Charles Stibitz at Bell 
Labs made an interactive complex number calculator from relays that could be used 
via dial telephones, and (b) that the first three stored program machines to work (not 
counting Zuse) were all British, and had different architectures than the Institute for 
Advanced Study computer proposed by von Neumann. 


An interesting sidelight is that making the “simple general memory” turned out to be 
much more difficult than the “little mechanism” (early on using vacuum tubes). The 
latter were big, expensive, etc, but relatively easy to deal with. By contrast the various 
attempts at using mercury delay lines and persistent spots on CRTs were shaky and 
fraught with errors. It wasn’t until ca 1952-3, that Jay Forrester made coincident 
current core memory practical for the Whirlwind computer at MIT. This launched the 
computing industry. 


So let's just “share the excitement of discovery without vain attempts to claim 


priority"! 


Alan Kay 
Still trying to learn how to think better - 2y 


Why do some media amplify cognition more than others? 


A comprehensive answer is beyond the size of a Quora response. 


But we are “thinking” all the time, and part of our thinking is highly influenced by the 
information given to us by our senses: especially kinesthetics/touch, 
visual/configurative, symbols. The latter is usually via primary sense impressions via 
visual or auditory. 


A central question to ponder is whether “thinking” can be improved by practice and 
by method — in other words, it is an activity that has things in common with physical 
learning e.g. sports (and this seems to be the case). 


One kind of media that can be invented is that which tries to represent what we are 
used to getting from the world: for example, visual and auditory phenomena, perhaps 
in the form of paintings or movies or recordings. This is both of great convenience 
and also requires very little new learning to be done in order to make use of it. 


The next central question is to ponder a difficult to learn medium, that requires quite a 
bit of new method and skill and exercise and repetition (even some psychic pain). 
What state will one’s brain/mind be in after fluency is gained? 


Could the new mechanisms set up by learning to make the new medium fluently 
accessible also be new mechanisms for how thinking is done? 


That seems to be the case for reading and writing. Many studies over the last century 
indicate that literate humans/societies think qualitatively differently than oral societies. 
The biggest differences — by far — are found with the combination of literacy and 
schooling. 


There is a large literature about this including important contributions from McLuhan, 
Postman, Ong, Innis, Eisenstein, Cole, and many more. 


| should add in here a little about “non-conformal” visuals — such as diagrams, 
graphs, schematics, etc. These have many of the properties of the “writing and reading 
of words”. They require a lot more learning and skill training, and what is built in the 
brain as fluency is attained, also seems to provide new ways for thinking about many 
ideas — and often new ideas themselves. 


(One of many fascinating excursions with these kinds of visuals was done by Ron 
Thornton at Tufts U in the US, where it was found that (a) a student's ability at reading 
and interpreting graphs predicted well their final grade in a first year physics course, 
and (b) that teaching students to get fluent in reading graphs vastly improved their 
overall performance (and grades) in the physics course.) 


One of many more facets about this large area, is to ponder how humans respond to 


“wall to wall environments”: such as cities, countries, cultures, etc. and to media 
environments such as writings, television, social media, etc. In brief, we humans will try 
to accommodate ourselves to the norms of the environments we are embedded in. 
This is because we are wired genetically to respond to “cultures” in ways that allow us 
to live in them. 


So, it’s worth pondering the kinds of re-norming that happen to human notions of 
“reality” from being embedded in 24/7 environments. 


Simple answer to the question here is that all media amplify cognition but in different 
manners and directions, some of which increase “civilization” and some which are 
“anti-civilization”. 


It's worth learning which and making choices ... 


é Alan Kay 
k 7 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 2y 


How can | incorporate a historic/literary approach to a topic about 
computer science? 


| like to bring in important events in the history of computing when writing about 
contemporary issues. The main reason is that a very high percentage of computerists 
are quite ignorant of what has happened — especially since things got rolling in the 
40s. 


It has generally been the case that the reinventions and rediscoveries have been more 
like “reinventing the flat tire” than “reinventing the wheel”, and many important ideas 
from the past have not been rediscovered in any form, and perhaps, more 
unfortunate, many recent ideas have been worse than any from the past. 


If you are looking for examples, you first have to delve into the history of the field. 
There are many starting places, but | think comparing what Engelbart wanted to do — 
and what he and his group did do — in the 60s to the almost non-existent notions for 
the WWW, will surely give you pause. Berners-Lee has apologized for not knowing 
about Engelbart, but this didn't help the last almost 30 years (for example, in was in 
the OS of the Engelbart system that conferencing was not just sharing face to face and 
voice and screen images but to also provide access to all simultaneously for 
interacting with the shared content). 


As to the “literary” part of the question, I’m not sure quite what is meant — but one 
part of it could be to encourage computerists to learn how to write well, and 
especially to include computer processes and interactions as part of the media they 
write with. 


€ Alan Kay 
-- 1 Have designed a few programming languages - 2y 


According to Alan Kay, LISP's metacircular interpreter serves as the 
"Maxwell's equations of software". What are then the Einstein's field 
equations of software? 


| think | said “Maxwell's Equations of programming" ... and | think this metaphor works 
pretty well for the late 50s and early 60s when Lisp appeared. It showed “in one 
eyeful” a very powerful way to model a superset of the kind of programming that was 
being done then. 


Metaphors only go so far, so bringing Einstein in might violate the spirit of the original 
observation. But one could imagine a similar deep succinct treatment of “process- 
oriented programming systems” that would capture the essence of "internetworked 
processes”. 


€ Alan Kay 
\ Worked at Apple (company) - 2y 


What did people at Xerox PARC think of the work Apple was doing? 


Here is a different way things could have gone ... 


The three largest intertwined visions of the computer future we were committed to, all 
burst into view in 1962: Ivan Sutherland's Sketchpad, Doug Engelbart’s “Augmenting 
Human Intellect”, and Licklider's “Man-machine Symbiosis” and “Intergalactic 


Network" via ARPA funding. 


This was all the same community. Sutherland was funded by MIT/SAGE, Englebart's 
was a proposal to the Air Force (later taken up by NASA, and then ARPA), and Licklider 
was asked in 1962 to use ARPA funds to do something about the future of interactive 
computing. 


By the time Parc was set up in 1970 — and stocked with young researchers from the 
ARPA community — it was getting to be time for another pass at both interactive 
computing and the the “Intergalactic Net” — partly because ARPA funding had been 
curtailed via the politics of the Viet Nam war, and partly because an inflection point in 
ongoing VLSI was just about to happen. 


The basic sentiment at Parc was that we wanted to do “Engelbart, Sutherland, and 
Licklider for everyone, including children”. A number of systems considerations, 
especially Ul ones, suggested that the larger schemes thought about for the 
Internetworking of all should also be employed at the personal level (and that this 
choice would be better and better as Moore's Law moved along). 


The decision at Parc was to build “time machines” that would start appearing in 1973, 
but would be in the form of networked personal computers of the 80s ca 1986-88 or 
so. This would provide a platform to realistically solve many systems, apps, Uls, 
graphics, and other problems that would pay off handsomely down the road. About 
2000 of these machines were eventually built. 


The catch was that it would be very difficult to do either “Sutherland” or “Engelbart” 
with these machines. For example, Sketchpad was programmed in terms of 
“executable solvable requirements” (which is just what the future demanded) but it 
would take years and more powerful machines than were feasible in the 70s to do this. 


And, one of many important features of Engelbart’s system was that any content could 
be completely shared real-time between any number of users, and all of them could 
touch and manipulate. This was built into the bottommost layer of the system and was 
a wonderful thing. Just how to do this gracefully on a distributed system was still an as 
yet to be invented scheme. 


These considerations upset Engelbart to the point of causing a schism in his lab at SRI. 
One group stuck with him and implemented their next system on a mainframe time 
sharing system (as the previous one had been) and this gave them what they were 
used to for a few more years. 


The other group — which could see that the problems had to be solved in a 
distributed way — mostly wound up at Parc. And the first versions of the distributed 
system were quite a bit weaker in many ways than the mainframe version, but 
stronger in a few important future directions. 


“All other things being equal” (but they aren't), Parc would eventually have produced a 
qualitative leap along the directions of Engelbart’s “Augmentation” (that's what we 
wanted). 


The silicon inflection point that allowed us to make our time machines also allowed 
simple inexpensive “8 bit micro” computers to touch the consumer and low end 
business markets. This started a wave in the late 70s that had pretty much nothing to 
do with “how computers should be used in society”. The people who made them and 
the people who wanted them had completely different motivations — and most 
especially completely different contexts and understandings about “computers for 
people”. 


This created a “societal normal” about every part of computing that was both low, but 
also touched a very large unsophisticated market place. 


A deep point is that the Parc and Engelbart approaches required more computing 
power than people were willing to pay for if they didn't understand what it was for. 
The weakness of the 8-bit and early 16-bit micros (and lack of microcoding) took 
programming back about 15 years into rather low level approaches. New 
programmers starting learning C etc instead of the much higher level languages that 
were needed. When Intel et al started to make processors that were aimed at C rather 
than the high level languages, the game was pretty much over. 


If you look at this from the “ARPA Dream” perspective, almost nothing from ARPA 
made it out except for things for which there was no competition. For example, the 
Ethernet was manifestly better than other attempts at local area networking. The 
Internet was manifestly better (and pretty much unique). Apple liked the UI work at 
Parc but — as Steve later said — missed pretty much everything else. 


The Parc work at the end of the 70s lacked a lot of prime needed features that 
Engelbart had already done. It is doubtful that Apple would have picked these up if 
they had been there. Microsoft was much less idealistic than Apple and just as 
unsophisticated. 


Different readers will have different reactions to this account. 


For my part, after | left Parc in 1980 and first went to Atari and then to Apple, | was 
shocked at the naivete and general lack of knowledge and understanding of what 
computing was and could be all about — and especially the lack of any curiosity 
about possibly great ideas from the past.* 


The present that happened was not about great ideas, but about generally 
unsophisticated individuals wanting to do things that made them feel better about 
themselves and that they could sell to other unsophisticates (that is not a basis that is 
going to give rise to great ideas). 


Sic transit gloria mundi 


* note that Tim Berners-Lee did not know about Engelbart when he patched together 
what was to become the WWW. He has apologized for this since. But the fact that a 
computer person did not know prime history is as usual as it would be unusual for a 
physicist not to know about Newton and what he did. This is the difference between a 
“pop culture”, and a “developed culture”. And it has hurt computing terribly, almost to 
the death of its best ideas. 


€A Alan Kay 

<1 Still trying to learn how to think better - 2y 

How did people at Xerox PARC navigate dealing with the corporate side of 
Xerox? 

There were different personalities at Parc with differing relationships and needs with 
corporate; likewise, Xerox was a large rich corporation with many divisions and 
somewhat like “England before the Magna Carta”. 


So some relationships were cordial, some were warlike, and quite a lot in between. 


| wasn't involved in most of this — | had a research group there and lab managers, 
especially Bob Taylor and Bert Sutherland, viewed a large part of their jobs to be 
finding ways to “let the researchers research”. 


| had pleasant relations with Jack Goldman — a New Yorker and a half — the Chief 
Scientist of Xerox. | had many skirmishes with Horace Becker, the head lawyer, who 
was both a bulldog, and one without understanding of how science makes progress. | 
found out later that what I'd interpreted as “skepticism” from a Xerox planner was 
actually deep dislike bordering on hatred. 


In other words, the usual human circus that happens when some of the participants 
are not very high minded. All this intensified as Xerox moved from the could do no 
wrong phase to starting to have difficulties in its markets. Money issues often bring 
out the worse in limited horizons. 


€ Alan Kay 

k.  Have designed a few programming languages - 2y 

How was software complexity changing as new technologies were 
introduced from machine code, through assembler to high-level languages 
and beyond? 


A too short but pretty accurate answer is that HLL languages and architectures 
“almost always” lagged the demands of the software, and this resulted in more 
complexity than could have been the case. 


Some of the exceptions are interesting. E.g. the B5000 HL machine with OS written in 
ESPOL, a HLL systems language. And: the Alto and Smalltalk system at Xerox Parc were 
design together similarly to provide a match to the software requirements. | thought 
Erlang was a good HL solution to the problems of the Ericsson switching system. 


But most projects try to tackle newer larger problems with older, often obsolete 
languages and tools which add on a lot of gratuitous complexity 


é Alan Kay 
“1 1am the Alan Kay in question. - Updated 2y 


What is the idea that is better than semphores from John McCarthy in 60s? 
KEYNOTE 1: Alan Kay - Rethinking Design, Risk, and software watching this 
on YouTube 38:00 


I've explained John’s idea of “pseudotime” in a number of Quora answers. Perhaps you 
or someone might locate these ... 


Thanks to Dane Clarke for these references: 


Here's a few that are fairly relevant: Alan Kay's answer to What's the best way of 
combining functional and OO programming in practice? 


Alan Kay's answer to What will happen if one rewrites the entire Linux using Smalltalk? 


Alan Kay's answer to Rich Hickey, creator of Clojure has said that object-oriented 
programming is unable to model “time” properly. What does “modeling time" actually 
mean? 


€ Alan Kay 


Still trying to learn how to think better - 2y 


Why hasn't Xerox PARC explored the stylus in the Alto era? 


We did aim for the stylus at Parc, and this was from experience with the GRAIL system 
at RAND especially (and GRAIL featured dragging via a stylus). 


My thesis project — The Flex Machine — also had a stylus/tablet that was less 


expensive than RAND's and not nearly as good). 


The sensor technologies in the early 70s that were good enough for a stylus were 
much too expensive to fit into the “at least 100 Altos” plan (e.g. a commercial good 
tablet was about $7000, and the Altos were supposed to be about $15K - $20K total). 


A great solution — I'm pretty sure by Chuck Thacker — was to make a small mouse 
with high res shaft encoders (I think it was about 250/inch) and this could be used to 
draw with and also had all the advantages of a mouse. 


These were in great favor in my group. Eventually they got displaced by lower res 
easier to clean larger mice: they were not so good for drawing, etc. 


Parc also did a variety of touch sensors for displays — the Notetaker came with one, 
and there were experiments with Altos. These worked for coarse pointing and 
dragging but not for fine cursor control. 


The mouse for the Mac was mostly about reliability — it was large and low res. 


€ Alan Kay 

<7 Lam the Alan Kay in question. - 2y 

Why didn't Alan Kay show up when Adele Goldberg arranged the meeting 
with Carl Helmers in PARC? 


First, I've heard of it. When was the meeting? 


€ Alan Kay 


Still trying to learn how to think better - 2y 


Why was there a mismatch between Xerox management and PARC? 


Parc was set up in mid-1970 as an initiative of the Chief Scientist Jack Goldman and 
backed by CEO Peter McCullough. 


Bob Taylor — former ARPA director, and funder of the ARPAnet — was hired ca Sept 
‘70 to set up the computing research lab. He was interested in this because Congress 
was in the process of putting the D on ARPA and changing the funding rules in ways 
that would hurt the edge of the art research that Taylor had previously backed. 


Bob had dealt with companies before, and one of his stipulations was a legal 
agreement that Xerox couldn't mess with the direction etc of research for 5 years. He 
then hired spectacularly well, and things got rolling. 


A few years later XDS — a mainframe company that Xerox had bought in the 60s — 
failed and Xerox had to write off a billion dollars (a lot in those days). This cost 
McCullough his presidency — he remained COB — and a very poor choice was made 
to replace him. Meanwhile, there were critics of the “elitist” Parc researchers, etc. 


The poor choice made things worse, and was eventually replaced. But in 1976, the 
desktop Japanese copiers — and a big surprise: the Kodak copier — hit the market 
and started to put big pressure on Xerox. 


The trend here was less concern with the future and more about protecting their 
“heartland business” (as they termed it). 


And the 5 year agreement was now up. 


More and more pressure of various kinds was put on any part of R&D that wasn't 
generating revenues. 


Taylor sacrificed his career to hold the ship together, and most of the edge of the art 
research at Parc was done successfully, and at “living lab" scales much larger than 
demos. 


Things were quite tough around the late 70s, and Taylor was eventually fired in 1982 


— his entire lab resigned with him. 


Alan Kay 
I'm the "Alan Kay" in question (try Google for the usual misinformation) - 2y 


Is Alan Kay the most influential person in the CS and IT fields? 
| couldn't possibly be ... 


€ Alan Kay 

“1 Have designed a few programming languages - 2y 

How is it that "coding" is a superior term to "conceptualize", if they mean 
essentially the same? 


As far as my 60+ years of experience goes, the two terms don't at all mean the same. 


In the 50s and 60s especially, “coding” was the act of writing code, and was distinct 
from “programming” which included design, and often in terms of an abstract 
framework — such as a flowchart — rather than actual code. The results here would 
be considered more than a “conceptualization”. 


Typically, one started off as a “coder”, essentially hand compiling designs into working 
programs, meanwhile gradually learning how to design. 


Today, | think that most people use “coding” and “programming” interchangeably. 


Alan Kay 
| am the Alan Kay in question. - 2y 


What is Alan Kay's opinion on the Akka actor library? 


Hard to have an opinion without trying something major. 


However, | like the idea of “actors” for a number of reasons (a) they are a reflection of 
the way | thought about “object-oriented message passing systems” in the 60s, and 
instigated Carl Hewitt’s work on them, (b) the actor research in the 70s went further 
than we did at Parc (our goals were to write whole systems for personal computing on 
the Alto, whereas MIT's were to do research). 


Both of these paths worked out (we did write the systems we wanted, and the actors 
work at MIT came up with valuable insights). 


A simple-minded theory of mine is that every research project needs a genius like Dan 
Ingalls, who can successfully turn big ideas into whole systems. The Smalltalk work at 
Parc benefited from this, and most other interesting languages haven't (e.g. the lack of 
whole systems for (say) Haskell, Clojure, Akka, etc). 


€ Alan Kay 
1 Have designed a few programming languages - 2y 


How do! master the art of Smalltalk? 


Let's widen the question to: “How can | understand and use Smalltalk-like 
language/systems?" 


A few basic ideas here are: 


One way to model any system is via parts/wholes relationships where a part can be a 
whole system, and relationships are in terms of intercommunicating messages. 


If the fundamental building block is a “whole virtual computer”, then each part at any 
level can always be represented *and* the interior will always be safe from external 
meddling. 


(So far this gives us a software rendering of the Internet (which indeed is what Smalltalk 
was all about — it came from the same research community that invented the Internet, 
and its invention was contemporaneous).) 


We can get something like a programming language by choosing the form of the 
messages that are sent and how they are received. 


We can get as many parts as we need if we have the facility to “instantiate” any thing 
we make. 


We can use these facilities to make our programming language, development system, 
the systems we want to create, etc., all down to the metal. 


All of these facilities are constantly live. And it is possible to make the system out of 
itself in such a way that it can not only debug itself whilst running, but it can debug 
the debugger whilst both are running, and — like the Internet — it need never be 
stopped for any change or repair or addition. 


Now here's the tough part for most computer people at any time, and especially 
today — and that's “we still have to do good design!" 


A given Smalltalk — like Squeak or Pharo — will have a kernel like the above but will 
be furnished with a “library” of many prior design choices covering a wide range of 
needs and felt solutions to this and that. In today’s world “mastering” generally means 
learning all the choices (very similar to mastering an OS) so that as many of them as 
possible can be used going forward. 


| think this is a good ploy for about 5-10 years of a feature, but is a disaster for very 
old features to have this power. 


And — for example — it would be crazy to use the definitional capabilities of such a 
system to simply model “bad old ideas” from previous programming experience. For 
example, an enormous bug in many so-called OOP systems is to use this great power 
to model old not well scalable ideas like data structures and procedures. 


It's important to understand that the perspective here is “easily universal” e.g. if you 
feel that a “functional style” is good, it is extremely easy to model that with this 
“material”. There are no incompatibilities with programming styles here (this is a huge 
misunderstanding in the “normal” workaday world out there). This is a “meta” 
approach, and you do need to learn to “think meta” in order to make the best use 


from it (it is very hard to do good design in any area without allowing some “meta” to 
be part of the thinking process!) 


To me: “mastering Smalltalk” means understanding what it is as a fundamental system 
building material, and seeing how to make the most expressively powerful thing from 
this that then can be used to create desired systems for many kinds of uses. 


Smalltalk dates to the early 70s and its current general form to 1975-76. It ranona 
personal computer with 128KBytes RAM (half of which was used for the display) and a 
2MByte disk (Somewhere between a floppy and a hard drive). It is of interest today 
because the kernel ideas were aimed at “the ages” and the sets of choices for the first 
library of extensions was carefully done (much of the credit for how well this turned 
out should go to Dan Ingalls). 


As an example, here's a tribute | did for Ted Nelson a few years ago using a revival of a 
70s Smalltalk. Everything in the system including tools, development, the equivalent of 
an OS, and the equivalent of “apps”, was done in a total of 10,000 lines of code. This is 
why we put so much work into the design of Smalltalk — we felt that the VHLL 
abstractions we could make would allow a lot of functionality to compactly fit into the 
Alto — and that the Alto microcode (1K instructions) could run the kernel fast enough 
to make the result usable. Demo/talk starts at 2:15 


Moore's Law has had 30 doublings since then (and secondary storage even more). 
This is roughly a change in scale of a billion (some of which was used to create 
personal computers of much lower cost than the Xerox Alto). Also, Smalltalk was done 
before the Internet got done, and this represents cosmic scalings. 


What this means is that you are much better off thinking things through again from 
scratch. 


But let's suppose that despite this you want a “modern Smalltalk-like system” to 
“master”. 


To my eye, the kernel ideas have held up pretty well, but there are a few fundamental 
things that could be made nicer and more clear. For example, though Smalltalk can do 
any kind of messaging in a system, we didn't put much effort into multi-machine 
massively parallel facilities. More effort is needed today. 


The most expedient “bad” thing we did in Smalltalk was to use a relational scheme 
that heavily shared (borrowed from Lisp). This was coupled with “too many 
dependencies” etc. 


Today, one would start with “marshalled objects” that can be transported over the 
entire Internet, and find ways to keep them as general as needed when reified back 
into life. This is a really interesting design problem, and a very important one. 


One of the implications of scaling is that conventions that work locally start being of 
less use. For example, names are pretty local tokens, one needs search for global 
scales (and search by properties rather than tokens). Both Smalltalk and Lisp have this 
idea at a fundamental level, but not much has been done at the programmer levels to 
make it useful. 


There are many more considerations that come with a "scaling of a billion plus the 
Internet”. I'll end with just one of these. Programming traditionally has been about 
specifying “hows” and leaving the “whys, and the whats” in the mind of the 
programmer. Encapsulation allows more of a why and a what to be captured. The 
programming of the future has to start with representations — understandable by 
programs — of “why” and “what” — and has to be much more about this than any 
apparent “hows”. 


This is a really important long known research problem for which almost nothing has 
been done or funded over the last 40+ years. 


€ Alan Kay 
“1 Still trying to learn how to think better « 2y 


Are there any known jazz guitarists who prefer playing on nylon string 
guitars, and if so, why do they prefer it, the sound or the feel of the 
instrument? 


Sure: e.g. Charlie Byrd and more. 


| was a jazz guitarist who later learned classical, and much later started experimenting 
with jazz on classical. Basically, with my small hands, it was tough to do a lot of the 
jazz “reaches” on the classical instrument, both via action and scale size. Also, some of 
the differences in the actual physical frets can add more difficulty. 


The reward is the reward of the classical guitar expression and its sound. Pretty much 
every physical property of a classical guitar is more difficult to deal with than with a 
well set up jazz instrument. 


The sound and expression possibilities started looking for “crossover” instruments 
with more accommodating overall properties — of which there are now many. E.g. 
Godin makes nylon string instruments with better actions, cutaways, etc. 


The best maker | found was Kirk Sand in Laguna Beach California, who makes superb 
crossovers of very high quality. | wound up getting two of his instruments. 


http://www.sandguitars.com/products.htm 

Eventually age got me and my left thumb joint wore out to “nothing” — some of it 
from the extra efforts no doubt. Now at age 81 | happily play the lute — classical — 
starting because of the much lower action pressure, but winding up loving the 
instrument and its music. 


It happens that much of early classical music requires improvisation as part of the art, 
and this is especially true of the lute. In fact, 17th century music in general is rather 
“jazzy” in many ways, some just like 20th century jazz, and some that are parallels to 
20th century jazz. So it's been fun learning and adapting to this as well. 


€ Alan Kay 
‘1 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 2y 


What is the main difference between a software engineer and a Computer 
scientist? 


Worth starting with “differences between Engineering and Science, in general”. 


Engineering was able to predate Science by thousands of years because “things can 
be made/repaired/etc. in principled ways” without having to understand just why a 
working thing works. “Cookbooks” of things that are known to work, serve well (as 
indeed they still do in cooking). 


Math started off very cookbooky also, and then was revolutionized by the pre-CE 
ideas of connected reasoning and proofs. 


Science started off very cookbooky also, and the word originally meant “gathering of 
knowledge” (often via rather indirect ways). Both Engineering and Maths eventually 
had big influences on what Science thought it was about. 


Two enormous leaps in the 17th century for what we call “Science” today were (a) 
Bacon's call for “a new science” that should be centrally about dealing with “what's 
wrong with our brains” via the invention of heuristics and technologies that can help 
us do better, and (b) the earliest really good scientific theories as (symbolic) working 
abstractions of what could be behind phenomena (Newton had predecessors, but he 
was so far off the charts that new charts were required). This started new qualitatively 
different paths, and a few more critical ones were added in the next few hundred 
years. 


For example, the clarification between what is “in our heads” vs “what's out there”. 
Einstein had a good saying for this: ‘As far as the laws of mathematics refer to reality, 
they are not certain; and as far as they are certain, they do not refer to reality.” 


An important way to think about this is that “Science” is neither the math, nor the 
theories, nor the phenomena, but the process of negotiation between them. 


One of the byproducts of this “largest thought revolution” was that Engineering — 
which eventually would be revolutionized by Science — lagged more and more by not 
taking up Science as a main facet. Because Engineering is mainly about making things, 
and many things can be made without real understanding, it took quite a while for the 
engineering fields to even see what they were missing. 


Today, we can think of the larger picture as a kind of Venn diagram of Tinkering, 
Engineering, Mathematics, and Science. The isolated subjects have their own 
properties and appeal to different personality types, but in practice, most of the best 
practitioners in any of these are very well versed in the others. This is worth pondering 
deeply. 


One of the largest unifiers and lingua francas is *systems* — a large enough set of 
perspectives to constitute a whole area of its own. 


When something breaks, valuable information and insights are often a byproduct. 
Many things engineers do affect the safety of people, so the equivalent of the 
Hippocratic Oath in Engineering is: “the building must not topple, the bridge must not 
fall, the plane must not crash”. 


Good engineers try to break things under controlled conditions so they won't break 
when deployed. Good scientists are always trying to break their creations, but know 
that they never have the final word. 


Note: there can be “sciences of the artificial"* for example, a “science of bridges". Once 
engineers build a bridge, it exudes phenomena which can lead scientists to better 
notions/"theories” of bridges. A lovely thing about our own time is the extent to which 
engineering and science can co-evolve both technologies and theories. 


So: there can be a “science of computation” to go along with the many engineering 
activities that are required to make artifacts (both hardware and software). 


The first Turing Award winner — Alan Perlis — was a “large mind” — and pointed out 
that the best scientific conception of the new field was “the study of processes: all 
processes”, and that computers themselves were both part of the "process world” and 
also could serve as the “new math” for representing the processes and notions being 
studied. 


To me — an old fogey from the main research community back then — this seems like 
the best way to proceed for now (and this perspective really made a difference in my 
research community's inventions of much of the interactive computing, media, and 
networking technologies in use today). 


It's easy to see that this perspective was has not been taken up by most of the people 
doing computing today. The main activities are more at the tinkering and “making 
without real understanding” levels than they should be. 


The vast gold rush which started in the 80s has produced something more like a pop 
culture** with pop music and pop values (for example, just compare the ancient 
engineering morality of safety with the modern computing tendency to try to make 
things "from matchsticks and houses of cards” despite that much deeper knowledge 
on many of these issues does exist). 


Part of the pop process was that many of the older meanings from back then got 
redefined downwards to the point where they no longer mean the same things. 


won 


Examples are: “computer science”, “software engineering”, “OOP”, "Al", etc. 


Just as pop cultures can't deal with non-linear processes such as the pandemic and 
the climate, the coupling of Moore's Law to oversimplified notions of computing has 
produced not just a mess, but one that becomes ever more dangerous. 


* see the great book by Herb Simon 


** pop cultures can produce important things also — the big problem of our day is 
that commercialism has grown the pop cultures to be dominant rather than a small 
percentage for hobbies and harmless fun. 


é Alan Kay 

<7 Still trying to learn how to think better - Updated 2y 

How can | implement an online microtonal adjustment algorithm to get 
more perfect harmonics with a method to avoid drift? 


We did just that at Parc in the 70s, and not very completely. Today it could be done 
much better (and probably has been). 


The idea is to try to imitate what singers and players of instruments that can shift pitch 
microtonally (bowed strings, winds, etc.) do when they are playing chords "in tune”. 
The system needs to take into account the key, the held notes, the kind of chord, and 
what comes next. 


An example of a “problem” that has to be solved is: suppose you are in the key of C, 
the held note is a C, and the chord is an Ab7 (which would likely have a flatter version 
of the held C). The system should probably stay with the held C pitch and produce the 
Ab, Eb, and Gb a bit higher to get a nice major 3rd. 


If the next chord is a G7, we can recover to the center pitch we are using for the key of 
C major. 


But suppose the next chord is a Db major (so we just did a modulation). Now we are 
“drifting” if we relate this to the Ab7 (and we should, because Db major has an Ab 
note in it). 


On the other hand, if we did the look ahead to see if there is a modulation or not, we 
can figure out schemes to stay in the “key center pitches”. 


We can hear the wide 3rd in equal temperament (13 cents sharp) as a “pain”. Flatting 
this really helps matters. The slightly narrow 5th in equal temperament is less of a 
problem but we can usually widen it just a little here and there to get more consonant 
Sths. 


A main principle is that we have a hard time hearing “nanotones” (a few cents off), so 
we don't have to get to perfect intervals to get much better audible results. This is 
what choirs, strings, and wind players do on the fly. 


It is likely that a good solution today would not be done just by an algorithm, but 
would also have stored a lot of common case solutions to the adjustments, and would 
use these when possible. 


My pipe organ has an unequal 17th century temperament to deal with the need back 
then for “Golden major chords" e.g. for Tunder and especially Buxtehude. So e.g. C 
major and D major are really in tune — and “Golden” — whereas there is no C# major 
(because there is no E# note — the F is too sharp). So C# F G# is really out of tune. 
I've been very surprised how many people I've demonstrated this to can’t hear it. 


Just to get a bit more detailed here. The general problems had been realized since 
Pythagorean times. Simple put, notes of the same pitch from different instruments 
(human voice included) will have different timbres because the mix of harmonics 
(usually pretty much multiples of the fundamental frequency) will be different. For 
example, a stopped organ pipe will only have odd numbered harmonics in different 
mixes, a brass instrument usually will have a strong 5th harmonic, etc. 


When you try to make a chord from 3 different pitches you will get interactions 
between the harmonics of all the notes. If you play C E G, the E fundamental will 
interact with the multiple of 5s harmonics of the C, and the G with the multiples of 3 
harmonics of the C. This could sound strained. A solution is to derive the pitch of the E 
and the G from the harmonics of the C. This will give a “golden major chord”. 


Sounds good, so get the rest of the scale likewise. For example, follow the 5ths — try 
this on a piano — and you will eventually get back to C. But note that, to do this, the 
notes on a piano cannot be harmonically aligned. Why? Because following the 5ths 
around harmonically is multiplying successively by 3 and octaves are multiples of 2: 
there's no multiple of 3 that is going to give a multiple of 2, so harmonically, the 
“circle of 5ths” can't close. This drove the Pythagoreans — and everyone since — 
batty. If you shrink the 5ths enough to allow them to close, then all the 3rds are 
annoyingly sharp (welcome to equal temperament!) 


An earlier ploy was to look at schemes that allowed good sounding major chords, with 
especially good thirds. This can be done on a 12 pitch per octave keyboard for a few 
keys (like my pipe organ | mentioned above). 


What if you want more good sounding keys? Well you could split the sharps: so in the 
place where there is an F# key, you could have the front be that and the back part of 
the key be Gb. 


<to be continued> 


Alan Kay 
‘1 I'm the “Alan Kay" in question (try Google for the usual misinformation) - 2y 


What are the mental models, biases, and fallacies that software developers 
should know? 


Only 17? 


€ Alan Kay 
<7 Have designed a few programming languages : Updated 1y 


Is there a more precise way to describe software than with programming 
languages? 

I'm using the definition that “a programming language is something that can take a 
representation and turn it into something that a computer can carry out”. But we 
could also think of a language that can describe processes that don't run ona 
computer (this was the case for “APL” for its first 6 or more years). 


The form of the rest of your question is not clear enough for me — for example, it 
doesn't say whether the description is about some software or whether it is supposed 
to be convertible into the software in some way. 


Perhaps you could try reformulating the question .... ? 


Added June 2022 ... 
I think several of the other answers are quite good enough. 


But it's interesting to think about what “precise description” might or could mean 
here. | used to be a mathematician, and part of the college education was about how 
many maths got much more precise starting in the 19th century (for example, calculus 
was pretty offhand about limits, convergence/divergence, continuity, etc.). Just what 
was meant by “non-rational”, countable and non-countable, etc. was murky. 


In parallel, many ad hoc workable things in engineering, eventually got much more 
precise — for example, sound recording was in use for quite a few years before much 
better ways to understand and describe what was going on were invented. 


A big fun thing about “digital” were the many ways to remove noise from noisy 
carriers, to get virtually 100% precision in creating markings. An expanding question is 
“what has to be done to get precise about what a marking stands for?” One pathway 
brings us back to mathematics, and whether what we can capture what we mean and 
intend with symbolic representations/definitions can stand enough on its own. 


Asimple minded answer with regard to computing is that a program in a 
programming language only has a precise meaning with reference to what happens 
when the program is executed (either relative to a machine or by a human: in 
possession of a “workably precise” description of the programming language). This 
can be made to work pragmatically enough ... 


€ Alan Kay 


<< Still trying to learn how to think better - Updated 2y 


What does Alan Kay think about computer algebra system (e.g. SymPy and 
SageMath) usage in mathematics education, before graduate school? 
Would widespread use of powered cognition assists like these rot younger 
students’ brains as some people fear? 

“Education” and “Training” have some overlap — along the lines of “understanding” 
and “doing” — but the distinctions are quite useful to keep in mind. In all cases we 
want something beneficial to change/grow in the mind of the learner. 


This implies that we should always be asking the question “When should it be easy, 


and when should it be hard?”. The former will not induce much change, whereas 
carrying through with the latter will, perhaps at the cost of time and discomfort. 


In the 70s when only a few children were using computers, we asked the group we 
were working with at Xerox Parc what they thought about what the computers were 
helping them do. They discussed this quite a bit and decided that it would not be a 
good thing to use computers to automate the center of your interests, because it would 
get in the way of learning. 


Musicians are quite familiar with this, and many music teachers will encourage 
students to learn to sing, and a “singing instrument”, as well as the conveniences of 
instruments like the piano — which do a lot for you, but they also make it easy not to 
pay attention and learn important things about musical expression. Many of the 
difficult things in music are just the things one needs to learn in order to induce a 
more musical brain/mind. 


Another example, is that it is well known that learning to fluently read written ideas 
produces large differences in cognition compared to that found in illiterate societies. 


So, when it comes to mathematics, we have to use these principles to try to figure out 
“When it should be easy and when it should be hard". We don't want gratuitous 
difficulties, but we do need to figure out ones that will promote real growth in 
mathematical thinking — and if possible in useful mathematical knowledge and skills. 


In all cases, we also have to take into account the differences between Kahneman's 
“Fast thinking” and “Slow thinking”, especially that many of the ideas for which we 
have to use slow thinking to reach understanding, will also need training to develop 
fast skills in doing. 


When we have a “helper” in the form of a person or a tool, we have to figure out when 
we just want to be practical and “get an answer" or when we need to understand 
something deeply before using a special tool. Most of our species chooses the former 
— we are nothing if not expedient — and this is very often not a good habit to get 
into. 


Here the subject is mathematics, one of the few human inventions whose “knowledge” 
can be completely understood — and this is also one of its gifts: the clarity of 
understanding the relationships of abstractions without the murkiness of trying to use 
them in science to approximately describe the real world. If you don't understand, 
then you can confuse the language with the phenomena, and miss the murkiness that 
is the very core of sanity. 


Ameer Fazal asked in a comment below: 


So, it is better to learn arithmetic, geometry, algebra, trigonometry, analytical 
geometry, set theory, calculus and so on with good quality books, working out 
problems with pen and paper. And with people and tools for various other reasons 
but primarily somewhat in an autodidactic fashion. 


| think this question in the large has to do with what citizens in a democracy should be 
required to learn as children. | think children do have rights regarding what they can 
be coerced to do, but | also think e.g. that society has the right to try to teach all 
children to read and write. This is a clear conflict of sorts, but I'm on the side of 
mandating the teaching of reading and writing as part of preparation for adult 
citizenship. 


also don't see how an adult can function as a voting citizen without a threshold level 
of science and maths, and this includes learning the habits of reasoning, questioning, 
understanding about “noise” (especially generated by our brains, etc.) 


t's interesting to try to work out what the threshold of these fluencies might include. | 
don't think they are what schooling currently thinks about either maths or science — 
both would be quite different (and almost unrecognizable to most adults today). 


n this light, it is worth pondering the future of “algebra” — both in its school use, and 
also the much more illuminating ideas of “Algebras”, especially “Linear Algebras”. 


One starting place for a better “K-12 mathematics” would be to just look at the maths 
that provide the mapping/supporting representations for scientific models, with the 
entry being the world children experience. For example, Seymour Papert had many 
wonderful insights, including: that the experience of movement by a child traces out a 
version of Gauss’ “Differential Geometry Of Vectors” and this can be enhanced with 
toys, ways to talk about the movements, and ways to think “mathematically” about 
what is going on. 


For example, consider making polygons by acting out “go a distance, turn an angle”. 
One soon discovers that to make any polygon (in fact, any closed figure) the “total 
turn” will be a full circle of 360 degrees. (This is a more powerful theorem than the 
internal angles of a triangle, and easier to prove.) From this it is easy to figure out 
polygons, etc. 


This is one entry to thinking in terms of vectors, and it works powerfully in 1, 2, 3, 4 etc 
dimensions. The genius of Papert produced here something that is simultaneously in 


the sensual world of the child, that is one of main forms of mathematics used for 
science, and something that is in the reasoning wheelhouse of children for *really 
doing math* — as Papert noted “Teaching children to be mathematicians as opposed 
to teaching them about math”. 


This is the same distinction between helping a child learn to actually play a musical 
instrument vs putting them in a “music appreciation” class. We should be able to see 
that — for many really important subjects — “standard schooling” is way off (and to 
the point that it actually hurts the children: i.e. it's quite a bit worse than “neutral”). 


€ Alan Kay 
A Have designed a few programming languages - Updated 2y 
What was the world’s first high level programming language in 1957? 


Is this really two questions? 


The simple answer is FORTRAN: it was by far the most developed, polished, and 
optimized HLL as of 1957. 


At the next — less simple — level are quite a few things in the works. 


Besides what Grace Hopper was doing — e.g. with “FlowMatic” — “IAL” (what is now 
known as “Algol 58" was in the works (and it was more “sweetly conceived" than 
FORTRAN). 


On Whirlwind at MIT, there was already a “quite HLL” *interactive* language that was 
pretty darn nice — called “George” — by Hal Laning and Niel Zierler (one history said 
it was implemented in 1952). 


Laning and Zierler system - Wikipedia 7 
Sanjay Vyas asked: There was also Plankalkiil? 


Reply: | thought about that — and I'm a fan of Zuse — but decided to restrict to 
languages that were actually implemented on either side of 1957. Here’s a sentence 
from Wikipedia: 


“In a 1957 lecture, Zuse expressed his hope that Plankalkul, “after some time as a 
Sleeping Beauty ©, will yet come to life” Plankalkiil - Wikipedia @ 


A strong argument in Sanjay's — and Zuse’s — favor is that Iverson's "A Programming 
Language” existed for years — and a lot of programs were written in it — before it 
was implemented. Most of us date this language to 1962 (the book) rather than to the 
late 60s when “APL360" was implemented. 


If we follow that line of reasoning, then we have to look at Ada's program for Bernoulli 
numbers, which was also just on paper. She certainly had a formal organization for this 
program. 


€ Alan Kay 

A Have designed a few programming languages - 2y 

Is it possible to create imaginary atoms of matter on a computer using only 
software graphics? 

I typed “simulated atoms” into YouTube and got many different videos showing 
various levels of physical simulations of atoms rendered using “software graphics”. 


Several of these show actual constructed atoms from elementary particles. My favorite 
was the one that showed a proton’s quarks interacting with electrons. That was one 
visualization | hadn't seen or even deeply thought about before ... 


There are also some very nice crystal and biological renderings using the basic 
“chemical physics” of atoms. 


| don't quite understand your phrase “using only software graphics”. 


(a) at some point what the software does has to move through hardware for us to see 
an image, 


(b) the physical simulations are mathematical — like the viewing simulations — but 
using somewhat different math and physical modeling (but all can be done in 
software — and all can be done in either a regular CPU or in a GPU). 


€ Alan Kay 
1 Still trying to learn how to think better - 2y 


What do you think of RMOX operating system from U Kent in England? | 
am amazed it is 32 bit only and not 64 bit (maybe secretly used by Englands 
government) https://www.cs.kent.ac.uk/projects/ofa/kroc/rmox-cpa.pdf 
This is a "CSP" influenced design. | don't like to criticize unless I'm willing to do it in 
substantiated detail, so I'll just say here that there are other styles for dealing with 
concurrent modules that | like more. 


€ Alan Kay 

‘7 Have designed a few programming languages : Updated 2y 

| want to understand and learn Lisp. | saw one of your YouTube videos 
where you said few understand it even today. What is the best way to learn 
Lisp and the creation of really good software that like the internet is 
reliable? 


Lisp presents itself as “a kernel language with a library” and as a “metalanguage” and 
“metasystem”. It's the latter that | think few in computing — in terms of percentages 
— have taken the trouble to learn and ponder deeply. 


There are two main aspects to “meta” in our context here. 


1. The first is how to describe the important parts of a language in a small and 
direct enough way to provide illumination. McCarthy's original definition of 
Lisp in itself does this extremely well. 


2. The second important aspect of “meta” is “reflection” — which means that 
programs in the running language can examine the mechanisms that are 
making them run. Lisp is very amenable to this, and some Lisps have been 
implemented to be reflective in considerable depth (note, that (1.) doesn't 
imply or require (2.). 


In the context here (1.) will exhibit — “simply” — how to make a mechanism that will 
run any program in the language and will produce the same results. A Lisp "“meta- 
program” for this is very small and clear. 


“Reflection” here can be approached by thinking about what kinds of live access are 
needed to allow a really good debugger to be written in the language so it can be a 
live process in the language. (Note that JS — which has some Lisp-like features — is 
quite poor with respect to complete reflection.) 


A next stage of reflective thinking would be to ponder what it means to extend the 
core meanings and mechanisms of the language at the same level as the existing 
kernel. Some excellent thinking about this can be found in the next four paragraphs. 


i. The basic idea is that you don’t need much to make a “universal computing 
material” and that can be used very compactly to represent itself in itself. This 
produces something that is both extremely powerful in its reach, and quite small 
enough to think about mathematically, as well as for writing programs. 


ii. | liked this basic approach very much, and we used it for Smalltalk “down to the 
metal” at Xerox Parc (we didn't base Smalltalk on S-Expressions, etc. — the treatment 
was in direct terms of “objects”. It was quite good with regard to both (1.) and (2.) 
above. 


iii. The “Art Of The Metaobject Protocol” does even better in some respects for “Lisp 
with Objects” — the early examples in the book are quite illuminating. 


iv. | also recommend Dave Fisher's 1970 CMU thesis “A Control Definition Language” 
for an advanced McCarthy-like meta and reflective approach to extensible languages. 


An historical approach to learning “deep Lisp" is to get the “Lisp 1.5 Programmer's 
Manual” and work through the first 13 pages under the direct tutelage of John 
McCarthy. Page 13 takes most people a few deeply concentrated hours "on a Sunday 
afternoon". 


Here are a few books since “The Lisp 1.5 Manual” that also include good travels into 
“Lisp-meta". 


A classic — and now (also) historical approach — for “Lisp” (now in terms of 
“Scheme’") is to get the Abelson and Sussman book “The Structure and Interpretation 
of Computer Programs", which has served as the textbook for learning this world of 
thought. This is longer and gentler (most computer people pronounce it “difficult’). 


There's a book “The Little Lisper” that many like. 
This book is quite good to take you though this kind of thinking: 
Essentials of Programming Languages % 


The last part of the question is a bit trickier, and | don’t think | know of a book that 
gives a good account of what is known (for Lisp or any other language). There are 
several parts to this (a) how to make an “eternally alive” system that basically can't 
crash and has enough meta to participate in its own improvement (b) how to write 
really good software. 


"(a)" has much to do with “confinement”, “fail-fast", and ability to always have 
processes that can look at other processes. The classic Smalltalks, and also Squeak 
Smalltalk — whose chief architect and builder was Dan Ingalls — can be studied. It is 
quite possible to go a bit further in a system like Squeak and even have it be able to 
replace its own kernel without having to stop it. 


“Meta” has gotten richer since the early days of Lisp. For example, Dave Fisher's CMU 


thesis on extensible control structures. I've mentioned Smalltalk. | like the motivating 
chapters of the “The Art of the Metaobject Protocol” a lot. | like the way the LOCUS OS 
was organized. LINDA has a lot to ponder. And, especially, there is Erlang — especially 
its systems principles — to look at. | think the “simulation of time” ideas of McCarthy 
are key. 


Etc. 


“(b)" has a subjective side ("good" software needs “good” people), and also a number 
of principles. One is to separate meanings from optimizations in such a way that (i) the 
meanings/intentions are debuggable and are readily readable and understandable by 
the humans involved, (ii) the system will run as desired with the optimizations turned 
off (a supercomputer may be required). This is a computing version of the 
CAD<->SIM<->FAB process in established Engineering (with a Big E) disciplines. 


Alan Kay 
Still trying to learn how to think better - 2y 


Are business interests and principled engineering often at odds? 
Take a look at “Challenger” and Thiokol, and the 737 Max and Boeing — as starters ... 


€ Alan Kay 
<< Still trying to learn how to think better - 2y 


What would "software CAD/SIM" look like? 

Let's try as close an analogy to modern Engineering (with a Big “E") as we can. One of 
the processes is called CAD<->SIM<->FAB(rication), and the idea is to solve as many 
design problems as possible in the CAD<->SIM part. 


CAD is a set of tools for making progressively better models of what is in mind that 
can be put through their dynamic (and otherwise relational) paces via SIMulation. A 
key principle here is to capture the “meaning” of the model and debug it. In many 
engineering areas, the SIM part is computing intensive, and may require a 
supercomputer to pull off. 


The "FAB" part of this scheme involves much of the optimizations that are needed to 
reify the model in the world (and usually to manufacture and distribute many copies). 


The computing world spends quite a bit of its time and energy working directly in the 
FAB area. Once optimizations are successful — often quite a bit of work — it is then 
easy to make as many copies as desired. 


If we try to stay with this framework — CAD<->SIM would be the tools to make a 
working model of our idea, which perhaps will require a supercomputer for real-time 
response. The model should capture our goals and requirements in as simple and 
meaningful way as possible. 


Some of the model code could be in the form of “constrained goals” (like Sketchpad 
or Thinglab or Cassowary), and some could be in the form of simple unoptimized 
programs. 


A simple example might be a component — server — that semantically does sorting. 
The model code would be very simple (whether as a constraint or a simple program). 
The SIMulation has to be powerful in order to be able to work on realistic sized 
problems. Note that the FAB part can be developed incrementally inside the 
component/server and could include dozens of different sorting algorithms and 
representations depending on the requirements (for example the system might want 
to use B-trees for some services). The model code can be run in parallel to check each 
optimization. A well designed system would be able to run with all optimizations 
turned off (this would be just the CAD/SIM part). 


Reality is more complex, but the principle of separating meaning from optimization is 
powerful, and the kind of system I'm describing is set up to keep them separate. 


We got to try some of these ideas at Parc — and found many of the principles in the 
analogy to work well. 


In any case, if you are going to have “requirements”, | think that they have to be 
simple, runnable and debugable in order to be of much use — and this leads directly 
to the need for powerful models and SIM, in part because so much software today has 
real-time requirements as well as relational ones. 


Historically, it's worth noting that the early drive for higher level languages was to 
program in terms of models as much as possible — they were also called "Problem 
Oriented Languages” — and to have the compiler do the optimizations. This is hard to 
do for many problems, even if you can give the compilers help and hints. It is also 
hard psychologically for programmers not to sneak in optimizations intertwined with 
the meanings in the code (this soon pretty much disappears the meanings (the whats) 
in favor of the hows). 


The idea of separating meanings from optimizations (and to have mechanisms in the 
development languages for doing so) helps with the practical business of actually 


writing software. 


€ Alan Kay 
<1 Have designed a few programming languages - 2y 
Has Lisp ever been implemented iteratively? 


I'm not sure | understand this question clearly. 


But the first implementation by Steve Russell — and many since — was done by 
writing a loop that manipulated structures in memory to create the equivalent of a 
recursive eval-apply. Take a look at the latter part of the “Lisp 1.5 Manual”, which 
discusses the scheme used with the 7090 at MIT (a machine which didn't have a stack 
mechanism (most didn't back then)). 


The entire machine code of the classic graceful Peter Deutsch Lisp implementation on 
the BBN PDP-1 is given in the book “The Programming Language Lisp" (Bobrow and 
Berkeley ed.). The PDP-1 didn't have a built-in stack, etc. 


€ Alan Kay 
\- 1 Have designed a few programming languages - 2y 


What language was ALGOL 58 written in? 


“BALGOL" (Burroughs Algol) was a version of Algol 58 that was implemented on the 
Burroughs 205 and 220 computers in pretty much the same machine code — the two 
machines were quite similar (good portions of this by Don Knuth). 


€ Alan Kay 
<< Still trying to learn how to think better - 2y 


How does one get as close as possible to mastering software engineering? 
“Engineering” is making something in a principled way — so there is definitely a 
“software engineering” of a sort. 


The idea was articulated in the 1960s as a *aspiration* because the “principled ways” 
known in computing did not compare strongly with where the established 
engineering disciplines — civil, mechanical, aeronautical, electrical, etc — had gotten 
to. 


Today, many computerists will say “software engineering” is defined by just what it is 
that they do (ditto with “computer science”). By removing the *aspiration*, they also 
have removed much of the motivation and ability to see, criticize and improve. For 
these people, "mastering software engineering means getting up to speed with 
current practice”. 


As an old fogey, | prefer the earlier, sterner and more enlightened stance of 
*aspiration* and like to compare and think about how making software could be 
brought more to parity with established engineering. 


One property of the established fields — “Engineerings” (with capital "E") itself — is 
that they were completely revolutionized by the invention of science. The takeup of 
science into engineering took quite a while — a lot was slow and grudging over 150 
years — but the big “E” is now an amalgam of traditional tinkering, making with 
principles, mathematical models and modern science. The latter two, especially the 
latter, have vastly changed what “making with principles means” and “the principles 
and tools” used. The advent of the computer — especially powerful interactive 
graphical computing — came along at just the right time to supercharge this change 
with powerful CAD tools, SIMulators of the CAD models, and increasingly FABrication 
driven directly from the designs vetted by simulation. 


It's worth noting the deep irony that the new computer tools for the engineering 
disciplines are almost always more comprehensive than the ones found in use by 
computer people for writing the programs! (There are a lot of "black screen simulated 
card-deck-glass-teletype” screens in use, in gross contrast with e.g. how something in 
EE or ME is designed and made today.) 


Going deeper, most software people are just trying to do FAB, and most of the tools 
are FAB tools — there is very little CAD and even less SIM in “software engineering’. 
To my old eyes, this doesn’t look or feel like real engineering process. 


From another perspective, | don't see a lot of “math and science” in the current 
“software engineering” process. | will leave expansion of this aside here with just “it 
doesn't help to have ‘computer science’ these days not really take the term ‘science’ 
seriously" — the whole phrase is perhaps more of a designer jeans label. 


Now let me pick on the first term in “software engineering”. Many software systems 
today are “legacy” in one form or another, and in any case are characterized by 
“rigidity” with respect to change (this is the opposite of “soft”) and by “fragility” with 
respect to safety and integrity (this is the opposite of what “E"gineering is all about). In 
fact these are backwards. What we want from “soft"ware (and from understanding 
“soft"ware) is “extreme flexibility” over the lifetime of an artifact ("soft" is why we make 
programmable computers) with “extreme safety and integrity”. 


This combination has possibilities of being radically extended for “Engineering of 
Software”, and | think our main job as “aspiring engineers” is to move software 
towards “extremely safe extreme flexibility”. 


I think this is a good starting place for rethinking all of this. 


I think “software engineering” — like the existing developed fields — should be 
actively aspiring to find “What Is Actually Needed” (WIAN) for both its projects and for 
the field itself. “Mastery” is then the process of continually moving closer to the ideals 
of the field — it is not a location. 


€ Alan Kay 
{Had something to do with "Object-Oriented Programming" - 2y 


What are the limitations discovered in the early programming language 
that triggered the need for OOP? 


The earliest programming was in the forms of the earliest computers: to find resources 
in memory — usually numbers, or numbers standing for something (like a text 
character) — and doing something with them: often changing them or making 
something and putting the results in memory. Control was done by simple instructions 
that could test and compare, and branch to one part of code or another: often to a 
part of code that had already been done to create a loop. An instruction not in the 
hardware could be simulated if there was a way to branch and capture where the 
branch originated, thus producing the idea of “subroutine” (first used in full glory with 
a “library” on arguably the first working programmable computer, the EDSAC by 
Maurice Wilkes at Cambridge, late 40s). 


Beginning programming was and is most often taught in this style, and it has been 
noted that the first programming language and style one learns tends to manifest 
most deeply throughout the rest of a career. Not a lot has changed 70 years later, 
partly because many languages started off with this style in mind, and thus the new 
languages were attempts to make this style more convenient to use (Lisp and APL 
were different early interesting exceptions). 


Another way to look at this is to note that (1) the degrees of freedom of a computer, 
and of the possible problems to be solved, coupled with the limitations of the human 
mind, means that anticipating all the tools needed will be essentially impossible. This 
means that *how to define new things* becomes more and more important, and can 
start to dominate the “do this, do that” style. 


Along with this (2) soon came *systems* — dynamic relationships “larger” than simple 
programs. Programs are simple systems, but the idea doesn’t scale up very well to deal 
with qualitatively new properties that arise. Historically, this never quite subsumed 
“programming” (and the teaching of “programming’). It gave rise to a different group 
of computerists and did not affect “ordinary programming" very much. 


| think it is fair to say today that the majority of programmers reflect this history: most 
do not regard *definition* as a central part of their job, and most do not exhibit 
“systems consciousness” in their designs and results. 


I think quite a bit of this has to do with the ways programming is taught today (more 
about this gets even more off topic). 


Looking at this, the earliest real “computer scientists” could see that e.g. subroutines 
were an extension mechanism, but they were weak — for example, to make a new kind 
of “data structure” was fragile and could not be made a real extension to the 
language. This led to a search for “extensible languages”. 


Other computer scientists could see that “data structures” were not a great idea e.g. 
sending a data structure somewhere required the receiving programmer to know 
many details, and the structure itself might not fit well on a different kind of computer. 
A vanilla data structure was vulnerable to having a field changed by an assignment 
statement “somewhere” in the code by “somebody”. And so forth. 


Most of the programmers were used to the idea of commanding changes to “data”, 
and so some of the fixes were mechanisms that allowed data structures to be invented 
and defined: one of the major styles today is “abstract data structures”. 


Along with all this were several ideas for dealing with simple smashing of variables 
(and the essential “variable” that is a data field). This was scattershot and reinvented in 
different ways. The most prominent way in strong use today is for very large 
structures: “data bases” that are controlled by the intermediaries of “atomic 
transactions" and “versioning”, which effective wrap the state with many procedures to 
ensure that a valid history is kept and relationships between parts of the data base are 
not violated. Eventually, it was realized that “data” didn’t capture all the important 
questions that could be asked — for example: “date of birth” could be “data”, but “age 
of" had to be computed on the fly. This was originally done externally, for some data 
bases, procedures could be included. (This required a “data base” to eventually be 
able to do what a whole computer could do — maybe “data” is not the operative idea 
here, but instead “dynamic relationships relative to time” works better. If so, then the 
current implementations of “data bases" are poor. 


In computer terms, modern data bases” are subsets of the idea of a “server”. 


Another line of thought — which goes back before there were workable computers — 
is that (3) certain easy enough to make computers can simulate any kind of 
mechanism/computer that can be thought of. This partly led to several landmark early 
systems such as Sketchpad, and the language SImula. 


If you take in the above, and carry to the extreme, its worth noting that only one 
abstract idea is needed to make anything and everything else: the notion of 
“computer” itself. Every other kind of thing can — by definition — be represented in 
terms of a virtual computer. These entities (I'm sorry | used the term “objects” for 
them) are used like servers, and mimic the behaviors of (literally) any kind) that are 
desired. 


A key point here is that just having practical means for creating objects doesn't 
indicate what should be simulated with them. And here is where the actual history 
has been and continues to be unfortunate. The most use of the idea — still today — 
has been to simulate old familiar ideas such as procedures and data structures 
complete with imperative destructive commands to bash state. This again goes back 
partly to the way programming is still taught, and to the rather high percentage of 
programmers today who are uncomfortable with design and “meta”. 


For example, since “an object” is a completely encapsulated virtual computer, it could 
be set up to act like a transactional versioned date-base. Or something much better 
and more useful than that. 


Note that most interesting representations of things do “change over time” so 
something has to be done to deal with this problem. So-called "Functional 
Programming" has to add features — e.g. “monads” — to allow state to advance "in a 
more functional way”. This might not be the nicest way to deal with this problem, but 
something does have to be done. 


And note that if you have gotten religious about “FP”, then it is really easy to make a 
pure FP system and language by using the universal definitional properties of “real 
objects” (being able to define what you want is the deep main idea!) But before you 
do, it will be good to ponder in larger terms. 


As Bob Barton once remarked “Good ideas don't often scale” — and neither do most 
simple programming paradigms. This means that another of the new things that can 
be built with “objects” — but have to be invented first — are less fragile ways to 
organize systems. 


Along the Barton “qualitative changes” line of hints, one could start contemplating a 
kind of “request for goals” kind of organization where the semantics of the worlds 
being dealt with are more richly human and the main center of discourse is about the 
“whats that are needed” rather than the “hows” that the system ultimately uses. 


This was one of the impulses behind some of the HLLs in the 50s and 60s, but the field 
gave up too early. The original idea behind a “compiler” was to take a “what” and do 
the work necessary to find and synthesize the "hows" to accomplish the “what”. 60 
years ago the “whats” were limited enough to allow compilers to find the “hows”. But 
the field decided to sit on these and not uplift the “whats” that would require the 
compilers to do much more work and use more knowledge to synthesize the “hows”. 
This is another way to miss out on the changes of scaling. 


In a “real object language” — with “universal objects” — it should be possible to 
define new ways to program and define and design any new ideas in computing — | 
think this is necessary, and that it has to be done “as a language” in order to be 
graceful enough to be learnable and usable. 


Historically and psychologically, *tools* have had a somewhat separate status from 
what is made with tools (and the people who make tools, and make tools to help 
make tools, etc. are also somewhat separate from the average maker). But a computer 
is always also a tool making shop and factory, you don't have to go to the hardware 
story to buy a hammer etc. This requires a change in mindset in order to really do 
computing. 


At Xerox Parc in the 70s, we made a “real object language” to walk both sides of the 
street (a) we wanted to invent and make a whole graphical personal computing 
system, and (b) we wanted to be able to easily remake the tools we used for this as we 
learned more. I.e. we wanted to "co-evolve” our ignorance in both areas to reflect our 
increased understanding. We were motivated both by “beauty” and that we had to go 
super high level in order to fit our big ideas into the tiny Alto. 


This process resulted in five languages, one every two years (thanks to the amazing 
Dan Ingalls and his colleagues), with one deep qualitative change between the 2nd 
and 3rd languages. That these languages could be useful “right away” was due to the 
way they were made (and partly because the languages contained considerable 
facilities for improving and changing themselves). To make progress on the personal 
computing parts, the constructs made in the languages had to be extremely high level 
so that the system could be rewritten and reformulated every few years. 


The 5th version of this process was released to the public in the 80s, and to our 


enormous surprised was not qualitatively improved again, despite that it included the 
tools and the reflective capabilities to do this. The general programmers used the 
language as though it came “tight” from a vendor and chose not to delve into even 
higher level semantics that could help the new problems with the new scalings 
brought by Moore's Law. (This was critical because there were somethings we didn't 
do at Parc because of the scalings that needed to be done to deal with "10 years later” 
scalings, etc.) 


To answer the current question after the “long wind” here: there are usually 
enough things “not right enough” in computing to need new inventions to help. Most 
people try to patch their favorite ways of doing things. A few will try to raise the 
outlook and come up with new ways to look at things. The deep “object” idea, being 
one of “universal definition” can be used for both purposes. Using it for the former 
tends to just put off real trouble a little bit. / think programming is in real trouble, and 
needs another round of deep rethinking and reinventing. Good results from this will be 
relatively easy to model using “real objects”. 


@ Alan Kay 
1 Still trying to learn how to think better - 2y 
Did Xerox PARC have time constraints? 


| was not aware of any during the period | was there, from late ‘70 to ‘81. 


However, we did often put them on our own projects — and for various reasons. For 
example, when we decided to make our own PDP-10 when corporate nixed buying 
one and wanted Parc to get a Sigma-7, the Parc feeling was that this would be worth it 
if the project was up and running in less than a year. 


For the Alto (which started off as a covert project) Chuck Thacker had a bet with a 
Xerox exec that such a thing could be done in 3 months (he won). 


This kind of voluntary internal time pressure for practical building of things did not get 
in the way of the freedom to spend arbitrary lengths of time “problem-finding” and 
mulling. 


The tensions between Parc and corporate was less pleasant than our previous 
experiences in the ARPA community with the relationships between the ARPA projects 
and ARPA “management” were more informal. 


But the situation at Parc was overwhelmingly set up for positive processes nontheless. 


é: Alan Kay 
<< Still trying to learn how to think better - 2y 


Did the Xerox PARC children had any problems context-switching between 
traditional math precedence rules and Smalltalk alternative left-to-right 
evaluation order? Would Mathematics change significantly if left-to-right 
evaluation were widely adopted? 

We made the decision not to have precedence for the very same reason APL did: if 
you have lots of operators, just remembering precedence starts to become a burden. 
The kids had no trouble dealing with the (easier) left to right “consumption” idea, and 
| never heard any complaints about the conflict with precedence in school. These were 
12 and 13 year olds. 


€ Alan Kay 
k Have designed a few programming languages - 2y 


How can a beginner develop thinking (computational and algorithmic) in 
programming? 

This question was asked by “Leonardo da Vinci", and | wouldn't like to deprive 
Leonardo of the pleasure of finding this out for himself. 


Now that we've had some fun with this, here are a few contextual perspectives to get 
you started ... 


1. “computational” and “algorithmic” have now become buzzwords, so a good 
starting place is to ignore these, and start with the term “thinking” (you 
could take a look at my Quora answer for my take on this term and how to 
develop it). Also, several of us wrote an ACM article with the title 
“Computational Thinking Is Just Good Thinking” (and the title makes the 
main point). One of many important ways to approach this is to realize that a 
threshold for modern thinking is to be able to know if you know enough 
to (really) critique the ideas you are dealing with, and to know how to 
approach learning what you need to know to critique. | daresay that most 
people in computing today cannot adequately critique what they and the 
field are doing, and this means that it turns into something like creeds, then 
religions, and finally into “cargo-cults”. 


2. Historically, the hardware of computers was tiny, and what they were used 


for was very limited. So learning how to come up with and program 
algorithms in machine code was the prime thing to learn. Don Knuth’s 
immense tome "The Art Of Programming" was just this: it used machine code, 
and as he said, was partially a “cookbook”. But designing and making 
systems is the larger idea, and quite a bit of it is not “algorithmic” in 
nature. I suggest that you get some books about systems and start 
learning to think in terms of systems. This is a new kind of thinking that 
co-evolved with the new thinking of modern science. Learning both of 
these will help greatly. 


w 


. [would put quite a bit of effort into learning about and dealing with “noise” 
of all kinds. This is a third kind of modern thinking, to go along with science 
and systems. In the large, “noise” is all the stuff at every level that you 
don’t want to have, and especially that interferes with what you do 
want. The two key ideas are (a) everything in the real world has noise — 
including both our biological and psychological brains — and (b) an 
astounding great thing is that if we can determine the maximum 
percentage of noise in a system, then we can use the very system to 
make processes that are free of noise. The “(b)" part was articulated by 
Claude Shannon, and is one of the greatest creations of the 20th century, not 
just pragmatically, but philosophically. It started off being used in 
communications and computing hardware, but it is quite general, and is 
found in nature also, especially molecular biology. It is used in every modern 
engineering system in one way or another, except ... that very few computer 
people are willing to do the work to apply the ideas to software — so this is 
“a biggie”. 


- 


Scaling is another deep problem for the human mind, and for systems in 
general. We have to learn how to design for both growth and large size, and 
this means that approaches which work with small problems — and with 
most examples in programming courses — don't scale well at all. In many 
cases you can’t use “elementary computing” to do real computing. How 
to design systems that have thousands of requirements and constraints is 
another large branch of knowledge. Still, one of the best general discussions 
of this is Christopher Alexander's first book “Notes On A Synthesis Of Form”. 


uw 


. Tony Hoare once remarked that "Debugging is harder than programming, so 
don't use all your cleverness to write the program!" This is one of the main 
bugs in general computing today. You are much better off designing for 
failure and change rather than success and to the original objectives. 
Most of the cost in software systems (more than 85%) is incurred *after* they 
are successfully “finished” and deployed. Real computing has to deal with 
real “ality” and being willing to pay much more up front to save much more 
over the life of the software. One way to think about this is that general 
computing today has forgotten the meaning of “soft” in “software” and why 
we do software at all (it is not to make intractable “hard” masses of code 
that cannot be reformulated). 


a 


A bit redundant — but perhaps useful to say it this way — one of the things 
that was most helpful to the powerful ARPA/Parc computing community in 
the 60s and 70s, was that most of its members had deep training and 
experience with difficult already established fields which required much 
discipline to acquire fluency — such as the physical sciences, mathematics, 
and engineering. This helped tremendously in trying to understand what 
each of these terms, processes, and perspectives might mean for the new, 
wiggly, and prone to BS field of computing. |.e. what would happen if we 
took “science”, “mathematics”, and “engineering” seriously within 
computing? Being able to do this is one of the main pillars of “thinking” 
and “thinking within computing”. 


Of course, there are more than 6 perspectives needed to become a decent 
computerist, but these are 6 important and non-ignorable ones that will start you off. 


€ Alan Kay 


Have designed a few programming languages - 2y 


What are some examples that demonstrate the usefulness of homoiconicity 
in programming? 

| don’t think it is that useful per se (the most important aspects of a programming 
language are to have both (a) the most powerful semantic frameworks for expressing 
meaning, and (b) to be as learnable and readable and “gistable” as possible for human 
users. 


The most famous example of “homoiconicity” is Lisp, but the language was not 
originally so. McCarthy thought and wrote in terms of what he called the m-language 
(which was basically normal math function syntax with square brackets, and arguably 
more readable — he thought it was). To do an interpreter of Lisp in itself, he wanted 
to have a representation for the programs in terms of lists, and chose what has 
become known as “Cambridge Polish", a prefix notation with parens to delimit the 
sequence of parameters — McCarthy called these "S-expressions”. 


When Steve Russell did a working interpreter for this on the 709 (or 90) he wrote a 
“read” routine — which translated text characters to lists — but “didn’t get around” to 
writing a simple parser for m-expressions into s-expressions. Thus the first — and 
most subsequent — Lisps were entirely in terms of what would be equivalent to ASTs. 


The hackers were happy to work in a kind of “machine code”, but there were several 
efforts to make a more readable Lisp. The official “Lisp 2” effort never quite made it. 
But there were several really good versions of “Algolic Lisps” over the years: two of 
them were at McCarthy's Stanford Al project: M-Lisp by David Canfield Smith and 
Horace Enea, and Reduce by Tony Hearn. 


Note that even in the homoiconic s-expression Lisps, you have to use functions to get 
at the “functor position” and the parameters. These happen to be vanilla Lisp 
functions such as CAR and CDR — but you could just as easily write a function called 
“functor” that when applied to an expression, will find and deliver that part of it. 
Similarly for the arguments. And now you can do everything you used to be able to 
do, but now in a much more readable language. 


So, the key to this kind of power is not homoiconicity, but the “meta” of having 
expressions be 1st class objects in your language. 


A really interesting set of tradeoffs exists for the processes of learning deep ideas 
about programming and programming languages. And these tie in with some of 
McCarthy's earliest ideas ca late 50s about “programming and computing in the 
future”. For example, he says in the very early “A Basis For A Mathematical Theory Of 
Computation”: 


Programs that learn to modify their own behaviors require a way of representing 
algorithms so that interesting properties and interesting transformations of 
algorithms are simply represented. 


So right off the bat he wants programs to be able to look at and “understand and 
reason” about programs as much as possible. 


The overlap between beginners and these programs is that the fewer concepts and 
relationships the better. This is where Lisp and its s-expression descendants provide 
the interesting tradeoffs between real simplicity vs a new notation that must become 
fluent. The Scheme (and later with the addition of Racket) people have provided 
plenty of evidence that this pathway works. Doing this with more normally readable 
expressions plus access to the ASTs — at least in universities — seems sparse to non- 
existent in my admittedly limited horizons. 


So it's possible that the homoiconic approach for meta is a kind of cognitive sweet 
spot. On the other hand (I think) you get something much nicer in many ways to go 
with the way Smalltalk dealt with reflection and meta. It is less of a hack etc. and 
requires a bit more learning before diving in. But this approach scales more gracefully, 
and starts with much more protection. 


€ Alan Kay 


Have designed a few programming languages - 2y 


What caused the ADA to be created? 
The Wikipedia article is a good place to start: 


Ada (programming language) - Wikipedia 
High-level programming language first released in 1980 Ada Paradigm... 


& https://en.wikipedia.org/wiki/Ada_(programming_language) 


This was a big project with many people — quite a few of whom were really good — 
contributing many opinions and ideas — quite a few of which were really good. 


And then there were many “other people” with “other kinds of ideas”. 


Much can be understood about what happened by reading this webpage, which gives 
the initial “Strawman” requirements for what became ADA. (The progression of “mans” 
over the length of the project was: straw, wood, tin, iron, steel ...) 


| got to watch this happen in the 70s, and various “language people” from Xerox Parc 
were also asked to comment and contribute (| went to a few meetings, and others 
went to more of them). 


| liked “kernel extensible reflective, etc.” languages, and ADA was never aimed in this 
direction, so my personal interest was not high — however, Dave Fisher — whose 
1970 super-reflective thesis on extensible control structures | much admired, was a 
powerful figure at DoD for this effort. And others | admired greatly — such as Bill Wulf 
— contributed designs (the “Tartan” language proposal for ADA). 


There was a DoD side-condition that they could not be talked out of — Fisher couldn't 
either from the inside — that ADA had to run efficiently on every existing DoD 


computing system. It was pointed out that Moore's Law — happening swiftly whereas 
language designs and implementations, especially by committee — would make it 
ridiculous not to replace most of the DoD HW by the time ADA got implemented to a 
useful state. 


| knew and liked Jean Ichbiah, and thought his LIS language design was pretty good 
(especially for something aimed at compiling into machine code). He eventually won 
the contract for Honeywell-Bull (and ADA turned out to be a lot more intricate than 

LIS was). 


Bottom line: a lot can be learned about both language design, software engineering, 
and especially how “whole-systems real computer science” could have helped. It is 
also worth noting that almost everything that was done in the 1995 revision of ADA 
was part of the original design discussions. 


As Robert Heinlein remarked, a really bad committee is “the only form of life with 10 
bellies and no brain". This wasn't that bad, nor was it as bad as a “typical committee”: 
“whose collective IQ is the minimum of the individuals in the group”. | think that ADA 
turned out a lot better than the earlier comparable project of IBM — PL/1 — another 
language that took forever, delivered not so well, and is still around (I know of at least 
one major insurance company in the US that still employs 2000 PL/1 programmers!) 


Alan Kay 
1 Still trying to learn how to think better - 2y 


To what extent does each invention require the invention of jargon for that 
invention? 


Great question! 


| like a process that seems generally unpopular, but has been extremely useful in 
science especially: that is to try to coin special words for the special new meanings, 
and part of getting fluent in the new knowledge is to get fluent with the new terms. 


This is almost the exact opposite of normal language churn in commonsense cultures, 
where redefining old words to mean new things in their day and time is the general 
case. This makes language ever more metaphorical and idiomatic (which ironically also 
requires a lot of learning and memorization: but for new meanings rather than new 
meanings and a new term for them). 


This sets up deep conflicts between the two cultures. | don’t think the compromises 
help either culture. 


Interesting examples abound in fields that are supposed to be technical — such as 
computing — e.g. the term “Artificial Intelligence” was defined by John McCarthy ca 
1956 and had quite a solid meaning (and very much based on the non-metaphorical 
combination of the two terms). Today, “Al” actually means such an extreme subset of 
the original definition that the original meaning has been quite lost, and references to 
the original meaning now have to use terms such as “Real Al", “Strong Al”, etc. This is 
also the case for many other terms in computing today, e.g. “computer science”, 
“software engineering”, object-oriented programming, and many more. This is a mark 
of a “pop-culture” (not quite a “cargo-cult” yet ...). 


Just to round this out ... it’s worth contemplating that science itself was initially also 
guilty of redefining earlier terms in attempts to make contact with outsiders, and 
reworked terms such as “know”, “proof”, “truth”, “theory”, etc in ways that still confuse 
non-scientists today. | think it is much better to just make up new terms for new 
meanings to help people realize there are important differences, and which have to be 


learned. 


€ Alan Kay 
“1 Still trying to learn how to think better - 2y 


What is the best book for Research Methods in Computing? 


It would be fun and illuminating to see the opinions of others — either as answers or 
comments here. 


The best book about the ARPA-IPTO/Parc research community (1962-1982 or so) is 
“The Dream Machine” by Mitchell Waldrop (it has been brought back into print by 
Stripe Press). 


This is an excellent history of those 20 years, with good historical accounts leading up 
to 1962. 


| wrote an essay tribute — The Power of the Context (%— to this community in 2004, 
which might help a bit — and it does have an extensive bibliography. 


The methods used by this research community were quite effective, and the results 
include many of the fundamental technologies still in use today (Internet, Ethernet, 
personal computing, GUI, DTP, laser printing, etc.) 


The Waldrop book is mostly about *what*, but does give some glimpses of *how*. 


| got asked to do several lectures at Stanford a few years ago about "How To Invent 
The Future”, and these have quite a bit more about methods used. 


€ Alan Kay 

A Have designed a few programming languages - 2y 

Did Alan Kay give advice to Guido van Rossum on the design of Python? 
The language is said to have a object model that leans toward the original 
late-binding, kaynesian definition, instead of the abstract datatypes route 
taken by e.g. Java and C++. 

| didn't give Guido any advice on the design of Python. The aims of Python and its 
early history are easy to find with Google. Guido also wrote a blog with many articles 
about different aspects of Python history and development. 


é Alan Kay 

A 1 Still trying to learn how to think better - 2y 

In software engineering, when might pragmatism be harmful in the short 
term? 

Almost always. As Don Knuth wrote long ago “Premature optimization is the root of all 


evil". 


If the term “engineering” is to mean anything in “software engineering", the higher 
priorities all have to do with the longer term life cycle requirements of the artifact — 
and include goodness of model, safety, testability, debugability, understandability, 
changeability, scalability, portability, and many more. 


One way to think of a well designed system is that the semantics will be separated 
from the optimizations as much as possible (one can imagine tools to help do just 
that) so that the system can be debugged and changed, etc., in terms of the semantics 
and will run — perhaps too slowly — with the optimizations turned off. 


€ Alan Kay 
Have designed a few programming languages : 2y 
Did Atari use the Forth language for any of their software? 


| was there from late 1981 to early 1984, and was not aware that FORTH was used in 
any Atari systems. However, it is certainly possible that a use escaped my notice. 


€ Alan Kay 
‘- 1 I'm the “Alan Kay" in question (try Google for the usual misinformation) - 2y 


Is it important to turn what is actually needed into a pop culture? 


| think “just the opposite”. It’s important to turn pop cultures into “what is actually 


needed”. 


é Alan Kay 
‘= Still trying to learn how to think better - Updated 2y 
Did UNIX fulfill or exceed the vision of the Multics project? 


| thought Paul Green provided a good answer to a related question. 


Why did Unix succeed and not Multics? 


& https://www.quora.com/Why-did-Unix-succeed-and-not-Multics 


The question here is about "the vision” of the two systems. | think that the Multics 
effort always had larger “vision” about “why?” and “what is actually needed” (large 
parts of the vision were “The ARPA Vision"). So my direct answer to the question 
would be “No”. 


(Added Aug 7th —| realized that | should have mentioned that MULTICS didn't 
fulfill or exceed the vision of the MULTICS project either ...) 


The approaches to implementation were also very different. 


My opinion of Multics was that it (a) chose a poor hardware framework, and (b) 
perhaps overacademized the software design. 


My opinion of Unix was that it (a) also chose a poor hardware framework (but a very 
different one), and (b) this and other factors rendered the software design too 
simplistic. (As with many artifacts in computing, “(b)” allowed something to get 
implemented anyway.) 


Both systems did have a few strong ideas and principles. 


My main opinion was that there were at least two alternatives to look at that were 
much better bases than either Multics or Unix 


1. The Burroughs B5500 type architecture, including multiple CPUs, fine grain 
protection, HLL OS and implementation languages, etc. 


2. The “Project Genie” type architecture used in the ARPA community (that 
morphed into Tenex, and another sprout that was the CAL-TSS approach). 


The 70s allowed practical microcoding and VLSI RAM. This was too late for Multics, 
but the Unix folks could have taken this route. 


By the end of the 60s it was clear (to some of us at least) that the B5500 approach 
would be “just wonderful” if it were microcoded. Parc indeed did all of its hardware 
using microcode, including several mainframes, but Parc's aim was personal 
computing, not time-sharing, so the mainframes were mostly used to emulate some 
of the hardware the external community was using (such as PDP-10s). The most 
interesting TSS use was "D-Lisp” of Warren Teitelman and Bob Sproull, which ran 
Interlisp on the mainframe and used Altos as windowed graphic terminals. This was 
quite good. 


In my view, one of the first really great systems that was a modern version of the 
“ARPA vision” was Gerry Popek’'s distributed multi- HW "LOCUS" OS in the 80s. This 
was really good. (It did use a modified Unix, but this was not a critical component of 
the design: any encapsulated process architecture would have worked, and some 
would have worked even better.) 


€ Alan Kay 

AT | read a lot - Updated 2y 

What are some good books about what is actually needed in software 
engineering? 

I've been trying to find some (| got invited to do a keynote at a large upcoming 
Software Engineering conference, and have been looking to see if there is more of a 
consensus and higher level view already in the literature). 


So far ... one observation of what | have looked at is that “what is actually needed" to 
deal with the scalings from both Moore's Law and hardware engineering doesn't seem 
to be treated. l.e. not just individual and super-computers and cloud computers, but 
the billions of them linked by the Internet and other networks. (This is disappointing 
because the original Internet design efforts were aimed at 1010 and more scalings, 
and were successfully pulled off by a group of top researchers.) 


| welcome helpful comments and suggestions! 


Added Aug 12th: no helpful suggestions from readers yet. Does this mean you don't 
think there are any “what is actually needed” books for “software engineering"? 


€ Alan Kay 
‘Have designed a few programming languages - Updated 2y 
What is your review of The Pragmatic Programmer (1999 book)? 


| don't have a detailed memory of the details of this 1999 book, but do remember 
some impressions. 


The main one was that it — correctly | thought — addressed “programmers” in 
general as people who had not learned or done engineering in their past or as part of 
learning to program. Much of it was like the advice a professional musician — 
especially a classical or studio musician — would give to pop culture teens and 
twenties who are starting to play instruments, but decline to take lessons, learn to 
read music, etc. l.e. “there are a bunch of things you really need to buckle down and 
learn fluently in order to be a productive pro”. 


Added Sept 1 2021 (from a comment below) — OK, | found a pdf of the 2nd edition 
and perused it. It has a lot of good advice (praising doesn't require a lot of 
substantiation). The simplest criticism | can write here is that | think it has many too 
many topics and too many styles to serve as a good starting place for beginners. | 
think taking more of a systems approach would improve the presentation a lot. | 
think the “pop culture” allusions (above) were too harsh (some tinges would remain 
in a more balanced criticism). 


Another recalled impression is that — as with so many books on programming — it 
was very devoted to the general milieu of programming in corporate US, and the 
kinds of languages, tools, and approaches most in use. This is definitely an aspect of 
“pragmatic programming”, and a lot of the advice along these lines seemed sound for 
those who chose to cope. 


My view is that this kind of pragmatism is quite harmful in the long term, and often 
even in the short term. l.e. learning better ways to apply band-aids does increase 
knowledge, but if the patient is suffering from suppurating wounds, then real medicine 
and the learning and inventing of real medicine is what is actually needed (WIAN). | 
don't recall any such rabble-rousing in necessary directions in this book. 


| think the book became quite popular and successful. 


€ Alan Kay 
<< Still trying to learn how to think better - 2y 


Can processes notify operating system that their stack and heap will need 
this much of memory prior to starting? Or is it operating system's decision 
to give a predetermined amount of space for heap and stack for every 
process? 


Here is one of the many great ideas in the “Project Genie” OS (lead designer: Butler 
Lampson, and later used to launch commercial time-sharing, and for the Engelbart 
“Mother Of All Demos” towards the end of the 1960s). 


The SDS computer was a 24-bit 0.5 MIPS machine with 64K words of storage and 
address space. The Genie project at Berkeley added a HW memory map with 2K 
pages, but the tiny slow resources were real barriers for practical time-sharing with 
multiple dozens of users. Both long swapping latencies and thrashing were deep 
problems. 


Butler decided to “force” working sets to be four 2K pages — so he could line up the 
next few working sets in storage to reduce process switching latencies to almost zero 
— and to reward programmers/programs that could predict the next working set. The 
reward was to have your process run on the fast que. If you failed to predict, and had 


a page fault, he would run you on the slow que for a while. 


This worked wonderfully well. The programmers did a lot more thinking about the 
dynamic state of their processes, and put in more SW instrumentation to help predict. 


The system overall was amazingly efficient, and could indeed run dozens of users at a 
time without latencies and thrashing. It's worth contemplating that the entire "Mother 
of all demos” was done with multiple users in real-time using this tiny little hardware 
of 192K bytes and 0.5MIPS. 


Another of the many ideas used in this system was aggressive “before the fact” use of 
the swapping drum memory. The idea was to have dirty bits on pages to mark if they 
had been changed, and to use about 2% of the CPU etc, to continuously write out 
dirty pages. This meant there was almost always enough room in core to bring new 
pages in when needed, without having to write out something first. 


This combined with yet another Butler perspective: something will crash — computer 
hardware was quite prone all by itself — so don't pretend that everything will work 
(and then get badly caught when a crash does happen). The aggressive flushing, 
prediction, and some other ideas not mentioned here, all allowed the system to 
resume within seconds of a crash. 


Margaret Hamilton, who was head of the software for the Apollo Guidance Computer 
called this kind of systems approach “before the fact” design (the “facts” being things 
you didn't want to have happen). When asked what she was doing, she said "Software 
Engineering"). 


In the 70s at Parc, we used some of these ideas — and much of the overall philosophy 
— for doing the systems aspects of Smalltalk. For example, the aggressive flushing 
scheme worked well for the object level swapping to a much slower secondary storage 
we did on the Alto. 


It's worth noting that even today, most OSs are quite rooted in the early 60s with 
respect to their perspective of computing, hw and sw — the notion of an OS actually 
“thinking about what is going on and how to help” has never crept into standard 
notions of computing, and joins up with the early 60s nature of most programming 
that is done today. 


Why would people not do learning and take actions that would help them? A look at 
the pandemic — and many other situations in normal human life — reveals that, for 
many, “comfort even with high ultimate costs” is worth more than the “discomfort of 
learning and changing”. 


€ Alan Kay 

‘<7 Had something to do with "Object-Oriented Programming" - 2y 

What is objective quality? 

Try typing it into Google and looking at the first hit (that’s what | did ...) 


€ Alan Kay 


~ Still trying to learn how to think better - 2y 


How was personal computing influenced by the hippie movement? 


The larger category of interest here is “the counterculture”, of which the hippies could 
be considered one branch (or possibly a parallel “counterculture”). For example, | think 
“The Whole Earth Catalog” is a “counterculture” medium rather than a “hippie” one. 
The “Free University” in Palo Alto and Menlo Park in the late 60s into the 70s was 
“counterculture” much more than “hippies”. 


One view related to your question can be found in John Markoff's book: What the 
Dormouse Said: How the Sixties Counterculture Shaped the Personal Computer 
Industry. This especially focuses on the Engelbart group at SRI, who were the epitome 
of ideas about personal computing in the 60s. Markoff was a student at Stanford while 
all this was going on, and he remains as one of the most respected techno-journalists. 


My view — from having started to visit Palo Alto in the late 60s and living there from 
1969 on — is that “hippies” had essentially nothing to do with the development of 
personal computing, but that the surrounding context of counterculture “helped”. You 
could look at my Quora answer about The Whole Earth Catalog for more info. 


What was the Whole Earth Catalog? 


& https://www.quora.com/What-was-the-Whole-Earth-Catalog 


€ Alan Kay 


Had something to do with “Object-Oriented Programming" - 2y 


In the STEPS report, an unresolved problem of massive intermodule 
coordination and communication is mentioned. Is it more of a hardware 
problem, or a software/design problem? 


For a long time, this was a problem that never quite got worked on — it lost out to 
other problems — but one | thought “someone” should do (maybe us). 


| tossed it into the list of techniques that would make STEPS interesting, and again, it 
didn't quite make the cut during the actual work (though we did look at and build 
some infrastructure for subproblems along these lines). 


The basic insight was that extreme scaling was going to make more and more difficult 
“knowing the actual labels” of useful things, and that something more like “searching 
for things analogous to descriptions of structures/objects” would be more and more 
needed. 


A simple example is suppose you need the sine function but don’t know that it is 
called “sine” (e.g. APL doesn't call it anything like “sine”). Ted Kaehler's "method finder" 
can find it in Smalltalk, because one of the things it can do is to look for I/O 
relationships to find the operation. So if you give it 30 and .5 it will find “degreeSin”. If 
you give it 3, 4 and 7 it will find "+", "or", etc. If you give it an unsorted sorted array 
and then a sorted version of the array, it will find the sorting functions in Smalltalk. Etc. 


One of the terrific theses — and people — we were inspired by during this time was 
"AM" of Doug Lenat. This was a “discovery system” that could “extract meanings from 
possibilities”. It had to have a description scheme, and | realized that something like it 
could be used for the module problem. 


One of the things we spent some time thinking about at Parc and thereafter, was 
“Semantic Typing", which is another example. 


And in the 80s came Gerlernter’s “LINDA” coordination language, that | thought was 
really a neat way to think about the larger problems of messaging and concurrency, 
etc. 


I'm a bit surprised that no modern day Kaehler, Lenat, or Gerlernter has stepped forth 
to do this. It is both a really important problem, and the solution (| believe) would be 
quite beautiful. 


It's similar to “communicating with Aliens” in the stars, but quite a bit easier. 


€ DY Alan Kay 


Had something to do with "Object-Oriented Programming” : Updated 2y 


Could some kind of analogy be made up between object-orientation and 
set theory? 


At Parc, we put some thought into this. One idea was to see about relationships 
between the set of objects you retrieve in a search and that a Smalltalk class generates 
a set of objects which obey the description in the class. 


One way to look at this is that the objects of a class that haven't been instantiated yet 
could be thought to exist — in potential — in a kind of “Platonic Wilderness”, and that 
saying “new” to the class was a request to “find” one of these, and bring it out of the 
“wilderness” into more tangible “being”. Or, for example, that "3+4" could be thought 
of akind of search request that will “find” a "7". 


And Smalltalk actually had a message “alllnstances” that could be sent to any class 
and this would gather a set of all the extant instances ... 


In set theory there is a term “characteristic function” that is an expression which 
determines/defines what is in a set ... A Smalltalk class could be considered to be the 
“characteristic function” of its possible instances. 


A search of objects to retrieve a result set — on the other hand — is very “Aristotelian” 
(he liked to generalize from collections of examples) ... The characteristic function of 
these sets is the search criterion. 


What if you kept these live? What if you could “force” an object into one of these sets? 


In the spirit of “not multiplying entities unnecessarily” (what Occam actually said), it 
was untidy to have two things that were almost the same idea. 


| wrote a paper about what it might be like to unify the two kinds of sets. Some form 
of this is still an interesting idea ... 


€ Alan Kay 

k Have designed a few programming languages - 2y 

Can the concept of Unix pipes be added to apps and GUI's? 

The “pipe” idea in Unix was originated by Doug Mcllroy in the early 70s, but the 
notion of interprocess communication was much earlier, and one of the first systems 


to use interprocess communication (and to invent “data-flow programming") was by 
Bert Sutherland (Ivan’s brother) at Lincoln Labs ca 1965. 


Parallel Data-flow programming with graphical programs ca 1965 


| forgot to say: so the answer is “Yes"! 


€A Alan Kay 

. Have designed a few programming languages - 2y 

What is the relationship among programming paradigm, object-oriented 
programming, and event-driven programming? 

“OOP”, especially in its original form, is a comprehensive way to define any 
computational idea by using intercommunicating entities that are a single kind of 
universal building block with the semantics of a whole computer. 


One of the most natural ways it has been used is to model complex systems, including 
complex software systems. 


In OOP, a message between entities is an “event”. 


To do “event-driven programming” one would organize one or more “event-driver 
objects", that entities which desire to be “event-driven” would initially subscribe to. 
When an event-driver object senses an event, it will send wake-up messages to all 
objects that have subscribed to it. 


a » Alan Kay 

% ‘Have designed a few programming languages - 2y 

What's the difference between levels and paradigms in programming? Can 
we talk about declarative and imperative paradigms only when talking 
about high level programming languages? 


Most computers in history, including most today, are set up to have many of their 
lowest level instructions be “commands” (in other words “imperative"), and especially 
with regard to rewriting lowest level memories. 


A “declarative” form at the lowest level would be something in which the desired 
relationship of the result — but not how to compute it — is interpreted by the 
hardware. Note that this would be the case for most elementary “operations” such as 
"ADD", "SUB", "MUL", "DIV", etc. There are different ways to do these in the hardware, 
and most computers do not allow the programmer to say which or to provide 
alternatives. 


So | think we just did talk about “declarative” and “imperative” ‘paradigms’ without 
mentioning “higher level programming languages” (until just now). 


Cc a» Alan Kay 
I'm the "Alan Kay" in question (try Google for the usual misinformation) - 2y 


What progress have computer scientists made toward solving the symbol 
grounding problem? 


wou 


Excellent question! Simple answers are “some”, “not a lot”, and “not a lot of work is 
going on these days on this problem”. 


(Note that David Vanderschel’s answer indicates that he thinks “imitations” by ML 
systems (ML as Judea Pearl points out is an extended form of "curve fitting”) are doing 
well enough. | don’t agree — | think that “organizations of organizations” will be 
necessary for the lower level simplicities to be effective.) 


A preliminary question is “how do humans seem to do ‘symbol grounding’ ?”. Our 
brains are also isolated from the world we live in, and are connected to the outside 
only by simple representations of “phenomena”. We can see that, philosophically, the 
problems of grounding are rather similar — the big questions have to do with the 
kinds of organizations needed. 


A place to start that has considerable thought and many examples is the CYC system 
of Doug Lenat, which is a very large working model of an attempt to capture much of 
human commonsense reasoning and beliefs. It's worth poking into this work to see 


how it accomplishes “grounding of its symbolic representations”. 


€ Alan Kay 
Still trying to learn how to think better - 2y 


Would a computer scientist from 1985 understand an iPhone and how it 
worked? Would they be able to take it apart and advance our civilization? 


1. Sure 2. Remove Twitter and Facebook? 


€ Alan Kay 

Have designed a few programming languages - 2y 

What elements of syntax should always be taken into account? 

“Syntax” is the position of something relative to the position of something else. It can 
be any dimension. It’s usually thought of as 1-dimension (as a string of characters or 
words, etc.) but in a GUI, quite a bit has to do with nearness and enclosures, and 
occasionally overlaps (another .5 D). 


It is a very good idea to learn how to be very flexible with regard to syntax because 
this helps learning new organizations of things. (This seems to be a rather lost trait for 
most computerists today.) 


€ Alan Kay 
Have designed a few programming languages : Updated 2y 


How does a desktop environment work? How does a GUI work? Does the 
OS recognize where the mouse is and on top of what? Does it depend on 
the pixels color? 

Here is the entire memory of the Xerox Parc Alto computer in the early 70s showing 
every bit. You can see that half of it was used to hold the bits (1 bit per pixel) that 
would show up on the 606x808 screen as black or white pixels). 


And here's the Alto itself with that image on the screen. 


The CRT (TV like display) is constantly scanning from left to right and then moving a 
line downward. Some software and a small piece of hardware are picking the bits of 


the image from memory so that the beam can be turned off or on depending on 
whether a pixel should be black or white. 


While this is going on the physical mouse is also being tracked. When the mouse is 
moved with one’s hand, a program adds the incremental change in the up-down 
direction to get a new screen up-down location (called the “y” location measured from 
the top of the screen) and a new left-right location (called the "x" location (measured 
from the left). This is used by another little piece hardware and software to display the 


cursor image on the screen. 


Meanwhile, there are other programs also running in the rest of the memory, and 
some of these have structures that represent the rectangular images we see on the 
screen (a rectangle can be represented by the x-y location of the upper left corner 
(called the “origin: of the rectangle), and the x-y location of the lower right corner 
(called the “corner”. It also knows which rectangles are on top of or below the others. 


When you press on the mouse button, the program compares the mouse location to 
the list of rectangles, starting with the topmost one until it finds the first one that has 
the mouse located inside it. 


The comparison is simple: 

if mouse x is greater than origin x and less than corner x 
and 

if mouse y is greater than origin y and less than corner y 
then the mouse is enclosed by the rectangle. 


If so, then it wakes up the program that deals with that rectangle and asks it to find 
where the mouse is pointing to things *inside* the rectangle (these are also always 
things with rectangular boundaries). At some point, a rectangle is found that knows 
what should be done when it is pointed at, and another program is awakened to 
cause the desired action. 


Alan Kay 
Have designed a few programming languages - 2y 


What's the best way of combining functional and OO programming in 
practice? 

I'm assuming that the terms in the question are contemporary “code words" for very 
particular practices that now have the labels “functional programming” and “object- 
oriented programming”. 


I'm going to use much older and more original meanings. For example, I've got a pure 
math degree in my misspent youth, and so | think of functions in terms of 
correspondences between “inputs and outputs”, and “official functions” as a particular 
kind of never changing correspondence between inputs and outputs, and that they 
have more relaxed relatives with looser correspondences between inputs and outputs. 


Similarly, | had something to do with programming and systems semantics in the 60s, 
and one of the “recognitions” | had back then was — since any computer with enough 
storage can compute what any other computer can compute — that all computing 
semantics can be represented just with “semantic computers intercommunicating with 
messages” as the sole building block. 


| should have come up with a better term than “object-oriented” for this, but that ship 
has long sailed. 


So any system with “semantic messaging computers” as objects can easily make any 
kind of “functional programming" system. 


But that’s not very interesting, except to note that a lot of expressive power would be 
thrown away by doing so. 


Another way to look at this is that “semantic messaging computers” allow any kinds of 
definitions for any kinds of things-in-process-or-not, so we can use it to easily make 
old style programming things like data-structures and procedures. This would be a 
very weak thing to do, but it is the predominant style of what is called OOP today. 


Or, we could note that “semantic messaging computers” can define and simulate the 
parts of any complex system no matter the size and complexity, and to render the 
parts themselves as systems, and so on recursively. That's a much more powerful way 
to look at things (and is the way | looked at things back in the 60s). 


Given this — and especially given the possible scalings involved — we need to think 
carefully about ways to make complex systems that allow us to do what we actually 
need, and also be able to think about — reason about — what we are doing. 


We especially need to have some good ideas about how to deal with “space and 
time”. All of our systems need to “move through time” (which means their state will be 
different in some ways at different times). This is true even for a pure “function” of the 
first kind (which always produces the same output for a particular set up inputs) — 


Why? Because we have to make the function in the first place, and we have to debug 
it and often improve it. A pure functional language cannot do that — some forms of 
kludge — or perhaps a really good idea? — have to be added. 


| like to point to John McCarthy — one of the greats in our field, and — as with so 
many of the real greats — one who was very early with his ideas and impact. John was 
an excellent mathematician who also had a deep feel for what is special about 
computing. 


In the latter part of the 50s John got more and more interested in what he started to 
call “Artificial Intelligence”. He was also doing some consulting and this brought him 
in contact with the SAGE air defense system: large systems of very large computers 
attached to radar stations and each other and usable by graphical display systems 
with pointing devices. 


John’s reaction was “Every home in America will have one of these”. He could see that 
the networked computers could be thought of as an “Information Utility" (as a parallel 
to the existing utilities for electricity, water, gas, etc...) and that the terminals in the 
homes could provide many kinds of “information services". Among other things, this 
got him to advocate that MIT etc do “time-sharing” of their large mainframes ... 


He also realized that the computer milieu of the 50s — machine code and the new 
Fortran — did not intersect well with “most people in US homes”. This got him to write 
a paper in 1958 — "Programs With Common Sense” — and to suggest that what was 
needed for the user interface was an active semi-intelligent agent — the “Advice 
Taker” — that could interact with users in their commonsense terms, could learn from 
“taking advice”, could problem solve on behalf of the user and itself, and so forth (MIT 
Al Memo 17). 


This got him thinking about how to implement such an Advice Taker, whose main 
mechanisms would be various kinds of logical deductions including those that 
required actions. There wasn’t much to go on back then but a few gestures at “list 
processing”, so he decided to invent a language that could be used to make the 
Advice Taker (and other kinds of robots), and more generally allow symbolic 
computation to take its place alongside the existing numerical computation. 


John was an excellent mathematician and logician, and so he also wanted to come up 
with “A Mathematical Theory of Computation” to put ideas old and new on a firmer 
basis. 


His result was LISP (for “LISt Processing”). | have written elsewhere about its 
significance. 


Meanwhile, he was pondering just what kind of logic, math, and programming (he 
thought of these as highly intertwined) could be used to deal with a robot in the real 
world. 


<eliminating detail here> A conflict was between at (robot, philadelphia) and at 
(robot, new york) which could not happen simultaneously, but could happen “over 
time”. This was like the problem of contemporary programming where variables would 
be overridden (and sometimes even files) — basically, letting the CPU of the computer 
determine “time”. 


This destructive processing both allows race conditions and also makes reasoning 
difficult. John started thinking about modal logics, but then realized that simply 
keeping histories of changes and indexing them with a “pseudo-time” when a “fact” 
was asserted to hold, could allow functional and logical reasoning and processing. He 
termed “situations” all the “facts” that held at a particular time — a kind of a “layer” 
that cuts through the world lines of the histories. cf McCarthy “Situations, Actions, and 
Causal Laws" Stanford, 1963 prompted by Marvin Minsky for “Symbolic Information 
Processing". 


One of the ways of looking at this scheme is that “logical time” was simply to be 
included in the simulations, and that “CPU time” would not figure into any 
computation. 


<more detail excluded here> This idea did not die, but it didn’t make it into the 
standard computing fads of that day, or even today. The dominant fad was to let the 
CPU run wild and try to protect with semaphores, etc. (These have the problem of 
system lockup, etc., but this weak style still is dominant.) 


Systems that have used part or all of John’s insight include Strachey’s CPL, Lucid, 
Simula, etc. Look at Dave Jefferson's TimeWarp schemes, Reed's NetOS, Lamport's 
Paxos, the Croquet system, etc. 


Back to McCarthy and — now — objects. One of the things we realized at Parc was 
that it would be a very good idea to implement as much of John’s “situations” and 


“fluents” as possible, even if the histories were not kept very long. 


For example, this would allow “real objects” to be world-lines of their stable states and 
they could get to their next stable state in a completely functional manner. They 
would be “viewing themselves” with no race conditions to get their next version. 


This would also be good for the multiple viewing we were starting to use. You really 
only want views to be allowed on stable objects (/relationships) and this can be done 
by restricting viewing to already computed “situational layers”. 


Parc was also experimenting with "UNDO" and the larger community was starting to 
look at “parallel possible worlds reasoning”. 


The acts of programming itself also wanted to be in terms of “histories and versions” 
and systems should be able to be rolled back to previous versions (including “values”, 
not just code). cf Interlisp, and especially the PIE system (done in Smalltalk by 
Goldstein and Bobrow). 


This was another motivation for “deep John" in future systems. l.e. do everything in 
terms of world-lines and “simulated time”. A recent paper by Alex Warth shows some 
ways that “Worlds” can be quite fine-grained. http://www.vpri.org 
/pdf/tr2011001_final_worlds.pdf @ 


The last point here is that “Histories R US". l.e. we need *both* progression in time for 
most of our ideas and rememberings *and* we also want to reason clearly about how 
every detail was arrived at (and to advance the system). 


John McCarthy showed us how to do this 60 years ago this year and wrote it down for 
everyone to read and understand. 


So: both OOP and functional computation can be completely compatible (and should 
be!). There is no reason to munge state in objects, and there is no reason to invent 
“monads" in FP. We just have to realize that “computers are simulators” and figure out 
what to simulate. 


a 


How did Smalltalk-80 manage to work with only 32,000 objects maximum 
(roughly)? 


Alan Kay 
Have designed a few programming languages - Updated Jul 20 


The simplest answers are that "it had to” and “we really wanted it to”. 


The Alto computer (April 1973) for many years only had 128KB of RAM, organized as 
64K 16 bit words, and roughly half of this was used for the 800x600 bit-map screen 
buffer. The 1K of microinstructions in a separate very fast 32bit wide memory and the 
16 zero-overhead task switching program counters allowed almost all of the functions 
done in the HW of more vanilla architectures to be rendered as SW. And this allowed 
us to come up with more ideal frameworks for VHLLs and still to have them run 
efficiently enough to make progress. 


The small amounts of working memory available made us want objects to have small 
footprints: the average Smalltalk object was less than 16 bytes in size. And we wanted 
to have as many as the address space would allow, so we made the address space of 
Smalltalk be 16 bits in object terms. We used the Lisp trick of having the even 
numbered objects be an encoding for small integers (the “tag bit" was the lowest 
order bit), so Smalltalk really had 64K objects, of which half were small integers (but I'll 
stick to the 32K for “others"). This meant that a typical Smalltalk full image would still 
fit in about 1MB of space (and this allowed for the object swapping scheme that 
Smalltalk-76 used to be the very slow and small “3rd level memory” Alto disk packs). 


Xerox Parc Alto — including display, keyboard, mouse, disk packs — and disk 


drive and bulk of the machine under the table 


= =| ==" 
ny tsi ee a 


A schematic Of Smalltalk-76/78 showing the logical storage organization of a 
running system, and a one-page “simulator” of the underlying byte-code 
interpreter (most of which was done in Alto microcode (Ingalls 78)). 


Much of both the architecture and compactness of the implementation was due to 
Dan Ingalls, who was recently inducted into the Computer History Museum Hall of 
Fellows (my vote for the top all around award in computing). 


A good example of what can be done with less than 32K objects can be seen in the 
tribute video | did for Ted Nelson some years ago now. For a Christmas vacation 
project ca 2014, a few of us decided to see if we could revive a mysterious file from a 
Parc Alto disk pack that Xerox had thrown away, but which had luckily been rescued. It 
seemed to be a Smalltalk image file from the version done for the portable Notetaker 
computer in 1978. 


Xerox Parc Notetaker portable personal computer ca 1978 with both mouse and 
touch sensitive screen 


This was the first Smalltalk in which almost everything was completely written in 
Smalltalk, and only a tiny kernel had to be written in some lower level code. The story 
of this revival can be found here: Reviving Smalltalk-78—The First Modern Smalltalk 
Lives Again @ . The heavy lifting for the revival was done mainly by Dan Ingalls and 
Vanessa Freudenberg. 


This was so much fun that | decided to use it to do all the presentation material for 
the tribute talk | was asked to do for Ted Nelson's 75 birthday (video starts at 2:15). 


You can see that this system from 1978 could do quite a lot with <32K objects and 
about 10K lines of Smalltalk code for everything down to the metal, including what 
would be called an “OS" today. 


The main difference here is that the revival had tons of memory space as working 
memory, and this allowed many more bit-map images used in the presentation to be 
fit into physical memory compared to then. Also, | am using the Dorado screen aspect 
ratio — 1024x800 — instead of the Alto or Notetaker dimensions. As noted in the 
video, Javascript is many times faster than even the Dorado microcode at Parc, so it 
was relatively easy to write the kernel and automatically get a good cycle time for an 
end-user. 


¥ 
"Se De ht oo) on oon 
Et Lk te Ne ee Oe me le, il 
Oe ee ee ee Oe oe 


The Xerox Parc Dorado console and screen ca 1979 running Smalltalk-78 


| wound up using almost all of the 32K possible objects for this video, so this is also a 
way to see what can be done with this limit. 


The “How” part of the answer has to do with “actual complexity” vs “accidental 
complexity” and how the Smalltalk language and the Smalltalk system were designed. 


Smalltalk-76 was the first of the Smalltalks to render most of itself in itself, and much 
of the neatness of the design was due to Dan Ingalls’ ability to “choose wisely”. One of 
my favorite papers in computing is his “Principles of Programming Languages-78" 
paper about Smalltalk-76 %, and this will give some of the flavor. The bottom line is 
that a really good and pretty complete system can be done with about 50 classes. 
Smalltalk-78 a few years later attempted to do a “really complete” system, and this 


required less than 100 classes. 


It was clear back then that inheritance in the way that Simula used it was on the one 
hand not strong enough to do a great job expressively, and on the other could be 
used willy nilly to violate good design principles (and the idea of “inheritance” itself). 
So “we” tried to be very careful. In the case of both -76 and -78, “we" was really Dan, 
who would pretty much rewrite the entire system for each of our major releases. This 
produced beautiful and compact expressions that had very little “accidental 
complexity”, and gave interesting examples of the “actual complexity” of relatively 
large interactive graphical systems. 


Bottom line: you don't need a lot of code in a well designed language and system to 
do a lot. So: besides getting a lot in about 10K lines of code, this also translated into 
getting a lot from less than 32K objects. 


€ a Alan Kay 


Have designed a few programming languages - 2y 


Is Object-Oriented Analysis to object-oriented design an easy transition? 


I'm pretty sure | don't understand this question. 


| think *design* in general is extremely difficult to do well. Most things that can be 
designed have a systems nature, and a good perspective on this will quite often lead 
to organizations that can be rendered on computers using a good enough object 
oriented environment. 


€ Alan Kay 


Have designed a few programming languages - 2y 


Papers about the Smalltalk history refer to the importance of the Xerox 
Alto's design as a "meta-computer", in which many unanticipated needs 


could be emulated in microcode. Why modern computer architectures are 
not microcoded? 

The idea of microcode was invented by Maurice Wilkes, a great pioneer who arguably 
made the earliest programmable computer — the EDSAC (pace Manchester Baby). 
The idea depends partly on the existence of a “large enough" memory that is much 
faster (3-10 times) than the 1st level RAM of the computer. 


A milestone happened when the fast memory for microcoding was made reloadable. 
Now programmable functions that worked as quickly as wired functions could be 
supplied to make a “parametric” meta-machine. This technique was used in all of the 
Parc computers, both mainframes and personal computers. 


Typical ratios of speed of microcode memory to RAM were about 5x or more, and e.g 
the first Altos had 4kbytes (1k microinstructions) that could be loaded on the fly. The 
Alto also had 16 program counters into the microcode and a shared set of registers 
for doing work. While running, conditions on the Alto — like a disk sector passing, or 
horizontal retrace pulse on the CRT — were tied to the program counters and these 
were concurrently scanned to determine the program counter that would be used for 
the next microinstruction. (We didn't like or use “interrupts” ... ) 


This provided "zero-overhead tasking” at the lowest level of the machine, and allowed 
the Alto to emulate almost everything that used to be the province of wired hardware. 


This made the machine affordable enough that we were able to build almost 2000 of 
them, and fast enough to do the functionality of 10-15 years in the future. 


Key uses of the microcode were in making suitable “language machines” for the VHLLs 
we invented and used at Parc (including Smalltalk, Mesa, etc.), doing real time high 
quality graphical and auditory “animations/synthesis”, and to provide important 
systems functions (e.g. certain kinds of memory management) as they were invented. 


It's worth looking at what could have been done with the early 16 bit VLSI CPUs such 
as the Intel 8086 or the Motorola 68K. These were CISC architectures and were fast 
enough internally to allow a kind of microcoding to support higher level language 
processing. This is particularly important to separate what is a kind of interpreter from 
having its code fetched from the same RAM it is trying to emulate in. 


The 68K in fact, used a kind of “nano-coding", which could have been directed to 
reloadability and language processing. 


The big problem back then was that neither Intel nor Motorola knew anything about 
software, and they didn't want to learn (and they didn't). 


The nature of microcode is that architectures which can do it resemble (and 
anticipated) the RISC architectures. And some of the early supercomputers — like the 
CDC 6600 — were essentially RISC architectures as well. So there was quite a bit of 
experience with this way of thinking. 


In the 80s, the ratio between RAM and CPU cycles was closing, and Moore's Law was 
starting to allow more transistors per chip. Accessing a faster memory off CPU chip 
started to pay off less (because going off chip costs in various ways, including speed). 


Meanwhile, it was well known that caching could help most kinds of architectures (a 
landmark study by Gordon Bell helped this understanding greatly), and that — if you 
are going to cache — you should have separate caches for instructions and for data. 


Up to a point, an instruction cache can act like a microcode memory for emulating 
VHLLs. The keys are for it (a) to be large enough to hold the inner loops of the 
interpreter, (b) to not be flushed spuriously, and (c) for the machine instructions to 
execute quickly compared to the cache memory cycle. 


Just to point the finger at Intel again, they did a terrible job with their cached 
architectures, in part because they didn’t understand what could be gained with 
VHLLs. 


A really interesting design was the first ARM — which was a pretty clean RISC and tidy 
in size. It could have been used as an emulator by wrapping it with fast instruction 
memory, but wasn't. | think this was a “point of view" disconnect. It was a very good 
design for the purpose of its designers, and there wasn't enough of a VHLL culture to 
see how it could be used at levels much higher than C. 


If we cut to today, and look at the systems that could be much better done, we find 
that the general architectures are still much too much single level ones, that ultimately 
think that it is good to have the lowest levels in a kind of old style machine code 
programmed in a language like C. 


A very different way to look at it might be to say: well, we really want zillions of 
concurrent and safe processes with very fast intermessaging programmed at the 
highest levels — what kind of architecture would facilitate that? We certainly don't 
want either “interrupts” or long latency process switching (that seems crazy to “old 
Parc people”. We probably want to have “data” and "processing" be really close to 
each other rather than separated in the early von Neumann ways. 


And so forth. We won't be able to be perfect in our hardware designs or to anticipate 
every future need, so we must have ways to restructure the lowest levels when 
required. One way to do this these days is with FPGAs. And given what it costs to go 
off chips, microcoding is far from dead as another way to help make the systems that 
we desire. 


The simple sum up here is that “hardware is just software crystallized early", and a 
good systems designer should be able to design at all levels needed, and have the 
chops to make any of the levels if they can't be purchased ... 


€ Alan Kay 
{Have designed a few programming languages - Updated 2y 
Who created the first computer with GUI and in which language? 


I've written elsewhere on Quora about this and similar questions. 


Both of the “earliest” British computers — Cambridge EDSAC, Manchester Baby and 
Mark | — had electrostatic CRTs, which could show a limited 2D bit-map of part of 
storage, and both soon were used for games. But | have never been able to find any 
mention of pointing devices. 


There were 2D tactile sensors going back into the 20s in the form of joysticks with 
potentiometers on them that could be used for remote control of airplanes, etc. (I put 
a picture of one of these in another answer.) 


It is very likely that Whirlwind | at MIT was the earliest use of a pointing device ona 
display (the light pen/gun invented by Bob Everett in the very early 50s). 


| don't think Whirlwind II (the SAGE air defense system) invoked commands with the 
light gun, but used it to point to different displays on the screen of different aircraft 
info. 


SAGE Interactive Terminal With Light Gun (ca. 1956-7) 


My vote for “a very early” real system with a real GUI, including a pointing device, is 
Ivan's Sutherland's Sketchpad (ca 1962). It was written in a sophisticated macro 
assembler — CORAL — done by Larry Roberts (who later was the driving force behind 
ARPAnet) and ran on the TX-2 (the SAGE experimental computer at Lincoln Labs). 


By the way, the Wikipedia article on GUls and their history is both inaccurate, and also 
leaves out important early systems. | advise not taking it too seriously (this is often the 
case with Wikipedia articles about computing, especially “earlier” computing). 


For example, the article wrongly states that the Parc GUI didn't have icons, dragging, 
etc. (why are people who are ignorant so pugnacious about bluffing?) 


Here is one of the systems at Parc done in Smalltalk ca 1974) showing many important 
aspects of what could be done, including multiple windows, icons, dragging, drawing, 
real-time animation, etc. 


€ Alan Kay 
-- | Have designed a few programming languages - 2y 


When hobbyists built their own computers back in the 70s what 
applications did they write? 

That was a tough time for personal computing. Amongst the disenfranchised, there 
was a hunger for participation, and most did not have access or the funds to rent 
time-sharing terminals. 


At Parc we were lucky to have the funding and a few geniuses to be able to design 
and build our own personal computers, and to make them fast and large enough to 
do what personal computers 12-15 years in the future were going to do. So we did 
“real applications” of the future, such as the WYSWYG word processor that became 
Microsoft Word in the 80s. 


Even so, we would have had a very difficult — not impossible — time of it if the Intel 
1103 dynamic RAM VLSI chip hadn't appeared in 1971. Decent 2nd or 3rd level 
storage was in the future, and powerful enough VLSI CPUs were not to happen until 
the very end of the 70s. 


So most hobbiests tried to make do with either making or buying an 8-bit micro. The 
displays were often just TV screens with a low res D/A modulation of an NTSC carrier. 


These could be programmed in either machine code or sometimes in a rudimentary 
version of BASIC (this could have been quite a bit better, and especially if JOSS had 
been taken as a model). 


This sounds limited — it was — but consider that Val Shorre was able in 1963 to do 
Meta II in just 8K 6 bit bytes on an IBM 1401 (a much less nice architecture). Peter 
Deutsch was able to do a remarkable Lisp in 1965 on a 4K words PDP-1 (roughly 8K+ 
bytes). 


In practice, most of the hobbiests didn't learn what Shorre and Deutsch knew about 
computing, and were content to try to make things they felt were in their space such 
as games. and often simple filing systems. The original BASIC at Dartmouth in the 60s 


was a little richer, but was part of a time-sharing system that had considerable file 
storage. Dartmouth had already made the university library open to anyone in the 
town (Hanover) and decided to do the same thing with the TSS. So one could find in 
the 60s in Hanover that the gas stations and convenience stories had teletype 
terminals with real apps made by the end-users that were used every day. This was 
really interesting! And more like the future than the 8-bit micros were. 


In the 80s, when Apple released HyperCard, there was a huge surge of home-built 
useful applications, some really interesting, and some quite terrific. 


€ Alan Kay 
Still trying to learn how to think better - 2y 


What are the pitfalls in implementing QFD? 


In *Marketing* “the customer is always right”. In *Education* “the customer doesn't 
know enough and needs to be taught stronger ideas”. 


Most of the important computer technologies in use today — Internet, personal 
computing, GUls, etc. — were not thought about by customers. Some of the best of 
these — and most useful for customers — were never adopted because the customers 
didn't want to learn new things, and the vendors couldn't (or didn't want to) find ways 
to fool them into real progress. 


So the real pitfalls in schemes like QFD is that real progress often doesn’t get made, 
and — worse — the results are often retrograde. 


A much better idea is to not think of “customers” — but *human beings* — and not 
what they *want* but what they actually *need*. Then figure out how to get them to 
learn it. 


€ Alan Kay 


Worked at Apple (company) - 2y 
What was the problem that HyperCard solved? 


The question should be rephrased as “For what vital parts of Personal Computing 
was HyperCard a great advance?” 


“Personal Computing” and “Personal Computers” were mostly conceived and defined 
by the ARPA-IPTO/Parc research community of the 60s and 70s, but whose roots 
reached back into the 50s, especially to Whirlwind at MIT. 


“Really early” examples of these were “George” (an interactive higher level 
programming language on Whirlwind ca. 1954 by Laning and Zwerler), and the Bendix 
G-15 (by Huskey and Evans, et al. ca. 1956). 


These helped lead the ARPA community to conceive of “personal computing” as 
including the ideal and goal of interactive “symmetric use and authoring” of 
“everything”. The idea was that there should be ways to present to end-users to allow 
them to do all of the kinds of things that (a) could be done with computers and/or (b) 
were done by computer professionals, (c) and especially new kinds of things that 
computer professionals hadn't thought about or had devalued. 


This included “end-user programming” — and it was clear that some considerable 
work would need to be done to allow non-pros to do this. 


“Definitive and early" examples were “Sketchpad” by Ivan Sutherland, “The LINC” 
personal computer by Wes Clark, both at Lincoln Labs; and especially: JOSS by Cliff 
Shaw at RAND, all around 1962. The latter was especially aimed at people who could 
really make use of a well designed higher level “end-user” programming system: it was 
the first really great “end-user programming system”. 


Also in 1962, Engelbart submitted his first proposal and plan for a large scale wide 
spectrum personal computing system. This became one of the central resources for 
“what is it that we are trying to do?”. 


Xerox Parc in the 70s was an outgrowth of ARPA-IPTO, and we wanted to do “all of the 
above" plus to add “media” and “meta-media” as a first class goal. This was partly 
sparked by the Dynabook idea (with a flat-screen bit-map display). 


This led at Parc to the bit-map display, shaded images and being able to paint them, 
high quality fonts, the GUI, desktop publishing, 2.5D animation, and much more, 
including: end-user programming in Smalltalk by both adults and children, that 
included being able to program the media as well as use it and construct it. 


This sparked off both the Macintosh and a bit later Microsoft Windows, but on 
machines that were quite a bit weaker in computer power and capacity than the 
earlier Parc-built systems. The result was quite a subset. 


Dynabook, Magic Slate, Wild Card, and HyperCard 


“Apple Fellow” Bill Atkinson really liked the Dynabook idea, and wanted to do a 
version that he called “Magic Slate”. Bill was a kind of genius along a number of 


wou 


dimensions, and one of these was a great sense for "fun", “simple great ‘magic’ ", and 


the mind and emotions of “end-users”. 


One idea he had for Magic Slate was that it acted like a stack of as many “pages” as 
you wanted, with the screen showing the top one, and you could search instantly to 
find pages to bring to the top to work on. The pages could hold any media. 


When the first Mac happened in 1984 he did a little desktop app called “Rolodex” 
which just held text — you could type anything onto a simulated Rolodex card — and 
it would instantly search and find and show faster than you could hit the “next” 
button. This was a truly great facility along all “end-user” dimensions. One of the 
things he realized is that “retrieval” almost always over-retrieves and thus the main 
feature that is needed is the fastest “rejection” that can be done. Because of this, 
“Rolodex” was used for everything. 


Meanwhile, he was working on a version of the Magic Slate media system, and 
eventually decided to do it for the Mac. This was originally called “Wild Card” — later, 
lawyers pointed out that it conflicted with already registered trademarks. So it became 
“HyperCard". 


The original internal version/demo of this had many of the features of the release 
version, but not end-user scripting. To cut to the chase of a complex story, Apple 
marketing didn’t want to put out HyperCard, | was also an Apple Fellow at the time 
and Bill asked me to help, we went to John Scully and got him to understand the 
significance, and John took on the champion role to get HyperCard productized and 
out the door. | took the opportunity to get John to insist that it be made “end-user 
programmable”. 


The “Scully-blessed” productizing was an immense project and left nothing to chance. 


Fortunately — | think — | was involved in other things so | was not around to ruin this 
project by insisting on a number of things that really had to be done right. The result 
was a masterpiece in many ways (and with quite a few things that were not done right 
... but the *masterpiece* was the point, not what wasn’t). 


HyperCard was a complete breakthrough as a wide-spectrum end-user personal 
computing system. The “brilliant naivete” as a whole thing was better for end-users 
than what we had done at Parc (it used only a small part of it, but made it much more 
learnable and useful for the average user than what we did). 


Outside of the company, HyperCard was quickly and widely adopted. A few years later 
Apple was able to count more than 4 million scripted applications *done by different 
users* in its “HyperTalk” scripting language. | think this is still the largest successful 
wide-scale percentage adoption of an end-user media system with real programming 
in history. 


HyperTalk End-User Programming 


| have a few theories about why both HyperCard, and especially HyperTalk were so 
highly adopted by end-users. 


= 


. the "Rolodex" design was adopted and expanded — especially the automatic 
indexing and extremely rapid search and retrieval: in almost all cases it was 
still faster than an end-user's hand hitting “Next”. 


Nm 


. it was very easy to make “media” on each card — generally easier than PPT, 
etc. 


w 


the scripting was carefully modeled from the semantics and pragmatics of 
end-user interactions with the media. Many HyperTalk scripts were just a 
pseudo-English rendering of the user actions to accomplish the same goal. 
(There's a real sense where this is teaching the end-user some weak non- 
scalable ideas about programming — but Bill and Dan Winkler (rightly) only 
cared about *getting the end-users to really write real programs*, and this 
worked so well in HyperTalk.) 


4. HyperTalk was not a great form for a programming language, and it wasn't 
really English either. But a HyperTalk script *looked like something you could 
understand and use, and modify when needed’. This “gistibility” was the 
deep key in end-user-land. 


wn 


. HyperCard had dynamic context layers to which the scripts were attached. 
The top layer was scripts for buttons and fields, then for the card, the 
background (like a master in DTP), the stack of cards, the Home stack. And 
finally were scripts associated with HyperCard itself. 


a 


Interestingly, you could also add stacks to this hierarchy to provide facilities 
like that of a code library. There's also a bit of “object-oriented” feeling in this 
and in many other HyperTalk features. 


“NI 


. Ascript was either invoked by an event “On” or by a “message”. The scripts in 
each layer were shown as a vertical panel in one view. This made “simple 
things simple, and complex things possible”. 


0 


The ad hocness that made the first brushes with HyperCard feel so nice and 
enticing and ultimately successful, did not scale well, and produced at least 


one manual from Apple that was almost 700 pages long! However, | don’t 
think more than a tiny few users even looked in the front part of this book. 
HyperCard’s success was due to how the first level of stuff looked and felt: i.e. 
it was a pure act of “end-user design”. 


And ... there was nothing about the successful parts of HyperCard that couldn't be 
brought to full strength without violating its framework (unfortunately Apple 
marketing still didn't like/love it enough to get involved — HyperCard was shipped 
with every Mac as a “value-added” and that was that). 


The History, Success on the outside of Apple, Failure on the inside 


HyperCard was first shipped in 1987. The large scale adoption had happened by the 
early 90s. Many “experienced hands" including yours truly urged Apple to make 

HyperCard the basis of an Internet wide media system (and this was redoubled when 
the WWW happened in 1992 and when the Mosaic browser was done the next year). 


That Apple did not do this will go down as one of the great tragedies of personal 
computing. The company was deeply floundering and towards no plans that would 
revitalize it (like really getting on the Internet to fulfill some of the other ARPA/Parc 
ideas). 


Important 


It's worth noting that the current Web browsers to this day (almost 30 years later), do 
not come close to realizing the ARPA/Parc principles of “symmetric use and authoring 
in a WYSIWYG environment”. The kinds of authoring that are allowed are completely 
ad hoc, limited, almost never symmetric with usage, and rarely involve good end-user 
interface design. 


To paraphrase Tony Hoare's remark about Algol, “HyperCard was not just great for its 
time, but was a considerable improvement on its successors”. 


New Ideas From Great Thinkers 


In the ARPA-IPTO heyday of the 60s, a lot of really great people were funded to follow 
their instincts about Licklider’s vision. This produced many ideas, some of which were 
coherent — this community communicated quite well (the grad students and interns 
were the messengers) — and some were quite distinct (the creators were “distinctive” 
and ARPA didn't care about coherency and didn't have master topdown plans). 


So there were lots of “Huh, | didn’t think of that!”s. 


Parc was much smaller, but had people drawn from the ARPA culture and a likewise 
desire for first class insights over coherence. 


In the 80s — after all of this — things were rather different in so many ways. 


But, still, there have been some of these special people with first class ideas that are 
independent of coherence. To mention just two: the way Bill Atkinson et al. thought 
about HyperCard and its scripting and packaging, and more recently the way Bret 
Victor has thought about a number of things, and most especially his take on both 
what Engelbart and Hypercard — and the ARPA vision — were all about when he 
created “DynamicLand”. 


Really great people are “vitally interesting” regardless of what you agree with or not, 
what you like and what you don't, etc. They make an entire world for the rest of us to 
witness. 


That is the nature of Art and Artists. 


€ Alan Kay 


Have designed a few programming languages - 2y 


Are DSLs a good way to represent complex object hierarchies? 


This question seems to assume that “complex object hierarchies” are worth better 
representation schemes. 


I think the more important question of good ways to describe “things that you would 
like to have instances of" is the one that needs to be treated first. 


| have always thought of “objects” as they would — and need to — function in a 
running system, and have noted many times that there is not yet a great scheme for 
the descriptive semantically related methods to produce definitions from which the 
instances can be made. 


Certainly, the single hierarchy of classes and subclasses is both too restrictive 
semantically and open to deep kludges pragmatically (for example, most languages 
with single inheritance allow a subclass to be nothing at all like the superclasses ...). 


Some of the classic Al expert system languages did better, and perhaps the most 
interesting and significant offspring of Smalltalk — PIE by Goldstein and Bobrow at 
Parc — had objects represented as “multiple perspectives on the different roles they 
might play, both operationally and definitionally”. This was more than a composition 


or trait scheme, and | liked what they did a lot. 


There definitely needs to be a language to deal with relational definitions (and that 
includes constraints to meaning). The complexities that have to be noticed can be 
seen in the CYC system, which has one of the most comprehensive ontologies. 


A key idea — which had its first experiments in the Flavors system at MIT — was the 
idea that — if you were going to do something like inheritance — you had to have 
something “algebraic” about it. |.e. a description of a relationship should be 
perspicuous enough to show the reader what the new category of instances would be 
like — this is especially the case for more realistic descriptions that possibly involve 
multiple kinds of relations that are combined. 


If, instead, there is enough freedom to “hack definitions” — analogous to being able 
to mess with pointers at runtime with code — then “clever people” will eventually (and 
soon) make messes that are both non-understandable and dangerous. 


€A Alan Kay 


Studied at University of Colorado Boulder - 2y 


How is the food at CU Boulder? 


| graduated from CU Boulder in 1966, so hope there is no food from back then still 
around! | was a “starving student” back then so my options were limited. 


€ Alan Kay 

“7 Lam the Alan Kay in question. : 2y 

What does Alan Kay think about the Commodore Amiga (1985) as a 
multimedia computer or an appealing tool for artists? How does it compare 
with Alto or Star, or with Sierra machine Atari was working on as you were 
Chief Scientist at SRL? 


| haven't found a good history of all this — but there might be one somewhere. It is a 
very complex set of interactions and much of the context needed to get a good 
picture is beyond the scope of a Quora answer. Part of the complexity is due to the 
many idiosyncratic and aggressive personalities involved, and the zeitgeist of the 
entertainment industries in general. 


Just one of many salient facts: in 1982 Atari's yearly gross alone was more than the 
gross of the *entire* motion picture industry. Their profits were large, etc. 


After leaving Parc, | became Chief Scientist of Atari in 1982, and was soon joined by 
Ted Hoff (the inventor and builder of the first microprocessor) as CTO. We were good 
friends, and the two of us together covered quite a bit of ground for futures for Atari. 


Being part of Warner Communications (the parent company) was not unlike being part 
of the Roman Empire. In common with other entertainment companies — and 
perhaps even more so — the culture was distinctly “New York" and rather 
“Mediterranean”. | had lived in NYC and gone to high school there so | was somewhat 
acclimated to these ways of thinking and doing things. 


Cutting to the chase, there were several computers under development at Atari 
designed to fit into the “post-Alto” world of the 80s (highly graphical, with GUIs like 
those at Parc etc.). Warner's was not very interested in these because they thought 
they could milk the earlier generation of games machines forever. 


| pointed out to them that there would likely be an inflection point when “first-timers” 
ceased to be the growth area of games. This happened in the second half of 1983, and 
suddenly they had arterial bleeding. 


Meanwhile, Jack Tramiel, the head of Commodore — a guy who was extreme even for 
the gaming industry (his favorite saying was “business is war” and he meant it literally) 
— got dismissed/resigned from Commodore at the start of 1984, and took some of 
the people with him to his new company.. 


In early 1984, the people doing the Amiga (one of them was a former Atari star), ran 
out of money and came to Atari for a loan. Ted and | thought what they were doing 
was really good, and helped convince the powers to grant the loan (I forget when but 
it was before | left in May — | think the loan happened after | had left — maybe in 
June). 


In June or July — | was safely gone — Tramiel did a no cash buyout of Atari. The way | 
heard the story, he thought he was going to get the Amiga as part of this. But Amiga 
wound up doing a deal with Commodore, to the immense rage of Tramiel. | don't 

think | understand the complete details involved, but there were forms of revenge on 
both sides of these actions (| was now at Apple and paying attention to other things). 


| should finish this with an opinion ... Pound for pound, | thought the Amiga was quite 
a bit better than the first Macintosh. This is because the Amiga designers were 
thinking bigger thoughts on the one hand — they wanted to do more kinds of things 
— and because they were able to assess the Motorola 68K much more accurately — 
and because of their heritage from computer gaming. All of these added up to a 
manifest need for special hardware for assisting graphics, which was a must given the 


cycle time and relationship of the 68K to RAM. 


It was “interesting” that Commodore did not understand what they had with the 
Amiga: their world view was much smaller than the designers of the machine. This 
really hurt the adoption of the Amiga in all the areas it might have flourished. This is 
especially “interesting” given that the several OSs done for the Amiga were generally 
quite good at allowing the potential of the HW to be exploited. 


As yet another part of this complex history, it should be noted that IBM had 
successfully frozen almost all personal computer sales except its own when it 
promoted-and-dumped the PC jr. ca starting mid-84 through early 1985 with an 
enormous advertising campaign artificially low prices, and then: *nothing*. This almost 
killed Apple and the Mac, and it really hurt Amiga as well (and continued in the US). 
One result was that the Amiga was picked up and appreciated more in the UK and 
Europe than in the US. 


As | said, even an attempt at a summary is too complex ... 


€ Alan Kay 

“7 Lam the Alan Kay in question. : 2y 

What does Alan Kay think about considering the creation of software as a 
design activity, putting it as a third item on the same level as science and 
art? 

This is another case, where instead of a simple question, what is needed are "10 
Things” and "20 examples”. 


€ Alan Kay 
‘lam the Alan Kay in question. - 2y 


What does Alan Kay think about Problem Frames (by Michael Jackson)? 

| try to avoid either praising or criticizing areas that | have not spent some time in 
dealing with the details. This is one of them. | remember reading at least one of 
Jackson's books on his approach to Requirements (but I've been living quite a bit in 
London the last few years and don't have this material at hand). 


| do remember a few things from contact back then. 


One was not being particularly interested in design schemes for complex systems that 
couldn't be implemented, run and debugged. There are many reasons for this stance, 
some of which could be called “prejudicial”. Basically, | think it is important to have 
your design philosophy as a vital layer of your development and running system. 


Another could also be called prejudicial: that trying to control more or less out of 
control mechanisms with a design scheme is analogous to and probably not much 
more practical than the ploys used to try to control large machine language 
programming projects in the early 60s My thought then (and now) is that you are 
much better off just inventing a good VHLL (and to have learned the chops to 
implement one). In fact, a number of VHLLs were the result of this way of looking at 
things (for example, Smalltalk), where concerns, both in the large and in the small (e.g. 
automatic storage allocation and collection, module protections, etc.) are part of the 
“given world". 


Two of the good ideas | remember from back then were the insistence that the human 
part of an application be looked at first, and that decomposition should be parallel, 
not hierarchical. But that is what was happening in the ARPA/Parc communities 30 
years earlier in the 60s and 70s — and especially for e.g. Smalltalk — so these ideas 
did not seem to add anything important to what this research community had already 
done. 


| most definitely agree that being able to do “requirements” qualitatively better — and 
as part of the intrinsic design process —is one of the most dominant needs in 
software engineering today. 


€ Alan Kay 

“<< Lam the Alan Kay in question. - 2y 

What does Alan Kay think about the following research proposal notes 
about a new software world? (https://osoco.es/thoughts/2020/06/notes- 
about-a-new-software-world/) 


We definitely need a new software world ... 
| think | can give almost the same answers to both of your recent questions. 


Two heuristics I've used for more than 50 years now when thinking about “something 
major", are first, to try to “write 10 things” about it, and then to “write 20 examples”, 


ranging from “simple things that must be simple”, “major things that should be as 
simple as possible”, things that are difficult that will have to be made, etc. 


Then there is a basis for discussion and further comment. 


€ Alan Kay 

‘7 Had something to do with "Object-Oriented Programming" - 2y 

What does Alan Kay think about the development of software libraries 
adopting Seymour Papert's constructivism and "microworlds" concepts as 
described in these articles? 


“Sure!” 
| think | can give almost the same answers to both of your recent questions. 


Two heuristics I've used for more than 50 years now when thinking about “something 
major", are first, to try to “write 10 things” about it, and then to “write 20 examples”, 


ranging from “simple things that must be simple”, “major things that should be as 
simple as possible”, things that are difficult that will have to be made, etc. 


Then there is a basis for discussion and further comment. 


€ Alan Kay 

A Have designed a few programming languages : 2y 

What does Alan Kay think of the "programming as interaction" perspective? 
(http://tomasp.net/blog/2018/programming-interaction/) 

As the author of the blog — Tomas Petricek — mentions, Smalltalk 50 years ago had a 
very large number of the things he wants to do (and still does). 


However, it is possible that he is not aware of earlier — and even larger along some 
dimensions — work by Warren Teitelman in Lisp at MIT, BBN, and then at Parc. | think 
he would have mentioned it if he was aware of this. 


For example, you can get Warren's (1965?) “Pilot” PhD thesis from MIT's CSAIL 
historical collection. 


€ Alan Kay 

<1 Have designed a few programming languages : 2y 

What will be the potential of a Smalltalk system based on declarative 
programming instead of imperative? 


Could it then be called a “Smalltalk”? 


I think the key phrase here is “based on”, which is used instead of “including” (which it 
already does). 


Still, | think that having a really well tooled and hefty subset of a language be 
“declarative” would be beneficial in many cases. 


There are tradeoffs in all styles of programming, and the actual worth of these often 
relies on the goodness of the debugger. For example, much declarative programming 
has some of the properties and feel of a simultaneous equations solver (in fact this is a 
subset of declarative programming). 


1. When the solvers can actually solve the system, it is like beneficial magic 
from the gods. 


2. When the solvers can't solve the system, they often can't tell just why. 


3. A few simultaneous relations are usually easy to think of when we try to 
program this way — but it is often difficult to get some kinds of dynamic 
systems to be synthesized from constraints (we may be wrong on the 
constraints, or on how they interact, or both). 


4. Sometimes it is both necessary and difficult to provide “advice” to solvers ... 


€ Alan Kay 
h 1 Have designed a few programming languages - 2y 


What does Alan Kay think of a new Smalltalk system more focused on the 
problem than in the solution space (e.g., using the ideas proposed by 
Michael Jackson or David Harel)? 


Any highly extensible language can be made into what was called a “Problem 
Oriented Language” (POL) in the 60s (and many of the languages back then were in 
fact POLs — even Algol). 


Most languages can be programmed to exhibit any semantics, so | think the issues are 
the bread of the sandwich: (a) are the syntactical forms that can be made good 
enough to express the POL ideas, and (b) can the pragmatic levels needed be 
achieved? 


Some of the variants of Smalltalk-80 can handle the pragmatics (because they include 
code for the actual VM and run-time system). The syntax scheme that Dan Ingalls 
worked out is a pretty ingenious: it is both simple and quite readable, but perhaps not 
extensible enough by itself for some POLs. 


However, the version of Squeak Smalltalk that we used in the naughts had a very nice 
implementation of Alex Warth’s OMeta as part of its live system, and this coupled with 
the late binding and live turtles all the way down of Smalltalk, would allow a wide 
range of POLs to be done (we did a variety for the STEPS project). 


Alan Kay 
Have designed a few programming languages - 2y 


Are there other “it's turtles all the way down" programming languages 
apart from the Lisps and Smalltalks? 

Originally Answered: Are there other “turtles all way down" programming languages apart from the 
Lisps and Smalltalks? 

I'm not sure exactly what you mean by “turtles all the way down" here. It is sometimes 
used with regard to Smalltalk because every kind of entity in it is a Smalltalk object. 


Using this as a guide, we could observe that in certain languages centered around 
“procedures and data structures” every entity could be some kind of procedure or 
data structure. Lisp would be one of these because its notion of “procedure” is strong 
enough to model control structures such as “if” expressions. 


Another way to think about “turtles all the way down” has to do with implementation. 
Very early in Lisp development appeared “LAP” (Lisp Assembly Program) and a Lisp 
Compiler written in Lisp which translated into LAP which assembled into machine 
code. 


The first Smalltalk had a way to generate machine codes, but this was not used for 
implementing the entire language at that time. But there were very interesting and 
useful self-bootstrapping models built-in to later Smalltalks, and eventually complete 
routes to bare hardware — the work of Tim Rowledge is particularly noteworth and 
should be looked at, as is the work of Dan Ingalls and John Maloney on Squeak. 


Semantically, it is generally easy to represent dynamic languages in themselves, the 
largest amounts of work have to do with optimizations. For example, both Lisps and 
Smalltalks would often have a route that translated into C — as a kind of universal 
assembler — that both would be found on most CPUs and also would often be 
optimized. The advent of JITs — Peter Deutsche’s early one for Smalltalk should be 
noted — along with lots of extra memory that would conveniently hold unwound 
machine codes — made a huge difference. 


At Parc — where we had microcodable hardware — we could tailor the HW to be 
“ideal machines” for VHLL languages. Here both senses of “turtles all the way down” 
were much more natural to do. 


| have left out the “Turing Turtle” idea: that most languages can be used to make a 
scratch interpreter for themselves — or any other language. 


€ Alan Kay 
\- 7 Have designed a few programming languages « Updated 2y 


What happened to the children exposed to Smalltalk at an early age in 
Xerox PARC, back in the 1970s-1980s? Did some of them grow to become 
great programmers/scientists. Do they use Smalltalk to this day? Or was 
there just no follow-up study? 


Originally Answered: What did happen with the children exposed to Smalltalk at an early age in 
Xerox PARC, back in the 1970s-1980s? Did some of them grew to become great 


programmers/scientists, and use Smalltalk to this day? Or there was just no follow-up study? 
as : 


There were no follow-up studies (we couldn't even get funding from either Xerox or 
NSF to do the longitudinal learning experiments that we thought were crucial to the 


research). 


And anecdotes are fun, but not much more than that. For example, the Goldeen 
sisters, Marian (started at age 12) and Rachel (at age 5) wound up going into math 
and science in college and then found their way to Apple as advanced system 
programmers, where | think they still are today. 


Bruce Horn (around age 12) did many interesting projects at Parc, including the 
microcode for the Smalltalk virtual machine for the Dorado when he was 16-17, and 
then was one of the two designers and implementers of the Finder for the Mac while 
still a teen-ager. He eventually got a PhD in CS from CMU and has created several 
important programming languages, etc. 


Etc. 


Lawson English made a good point in a comment which I'll reproduce here along with 
my reply: 


Of course, a big question is: how were these kids selected? 
Just living in that part of the country gives kids a huge leg up over living in another 
part of the country. 


Reply: This is why | didn't make any special claims. In any case, our (Adele's and my) 
model was that — in order to make any interesting claims — we really needed to 
show transfer via a serious longitudinal experiment that would take quite a few years. 
This didn't happen. 


Adele came up with a great and simple process for teaching the kids Smalltalk, and 
this was pretty much universally successful. 


We worked with quite a few different groups, including two theoretical extremes (a) a 
“disadvantaged” group from East Palo Alto, and (b) a “super IQ” group selected by the 
school they were in. 


Bottom line — and | think | mentioned this in the history | wrote — there was very 
little difference in the *latency* for understanding the different ideas. That was very 
interesting. The difference with the “high IQ” group was not less latency, but that they 
could really generate ideas of what to do next — they were very “live”: we rarely had 
to make suggestions. 


A more interesting situation years later was with an entire busing school in LA with 
hundreds of children (the rules were that the school had to have the same racial and 
ethnic balance as LA itself, and the children were selected by lot). However, the 
principal had made this school — in very humble temp buildings on the campus of a 
regular school — into the best school | have ever seen. 


So despite the random wide range of origins, the children were in fantastic mental and 
motivational shape. Going in we pledged ourselves a version of the Hippocratic Oath 
“Above all do no harm". 


This was an illustration that “a great adult” such as the principal — Roberta Blatt — 
can create a great environment not just for the children, but also for the parents of the 
children in their homes. (As | try to say in almost every talk | give on almost any 
subject: to generally improve things for both children and adults, you have to 
generally improve the richness of the culture around them — trying to stuff in little 
snippets of knowledge in isolation, only works for a tiny percentage of children and 
adults.) 


Another great exemplar was an LA high school teacher | became very friendly with — 
Jaime Escalante (the “Stand And Deliver" guy). He was really fantastic (and even better 
in real life than the great portrayal of him by Edward James Olmos). And the real story 
was even better than the movie (which is a little smudgy about why he was so 
successful). Jaime Escalante - Wikipedia 


He wound up being so successful that he was pretty much driven to resign from 
Garfield High School. Given how things actually work in the public education systems 
in the US, this is not at all surprising. 


€ Alan Kay 
<< Still trying to learn how to think better - 2y 


Why do many projects done by computer researchers (eg. Bret Victor or the 
VPRI) remain closed-source, even though papers get published about them? 
Because many kinds of experimentation are aimed at finding out things, not to 
produce tools for others to use. This is especially true in the user interface area, where 
a large number of different experiments are required with different kinds of users, and 
there is often never a complete system actually built. 


The results of these experiments can be summarized in both writings and videoed 
talks showing demos. There is almost always enough public information available to 
allow replication of the experiments. 


And, in most cases, every few years, something major and public is released, or put on 


public view. 


A question is often asked, “well why not make the code public anyway?” In practice 
this is generally completely impractical for many reasons, including just dealing with 
detailed questions and bug complaints about many obsolete versions, and that the 
kind of research that e.g. VPRI did is not at all incremental. Quite of bit of the work of 
this group over the years has been to start from scratch as often as possible. For 
example, there were 5 complete Smalltalk systems done at Xerox Parc at about 2 year 
intervals — each one tried different design ideas and required major implementations. 


Squeak Smalltalk used the “Blue Book” definitions (which were public and open), but 
did a completely new “bottom” (which was also made public and open). The tooling 
needed for others to be able to use the open sources was done carefully enough so 
this was successful. 


The “Etoys” system for children was done in Squeak and also made open source. The 
Croquet system was done in Squeak and one of the spinoffs was “Open Cobalt” a 
non-profit open foundation. 


Basically, seasoned researchers, especially on the fringes of knowledge, have to decide 
how to use their limited amount of time on the planet to best advance the research, 
while still holding to the scientific ideals of sharing what is powerful and interesting 
about the results. | think Bret Victor, VPRI and many other deep research efforts have 
done well at finding good balances for this. 


€ Alan Kay 
{Still trying to learn how to think better - 2y 


What would be the necessary elements for setting up a biomedical 
equivalent to PARC? 


Before | write too much, | need to know if you are familiar with Janelia Labs (near 
Dulles Airport in Virginia). 


Janelia Research Campus - Wikipedia 


Janelia Research Campus is a scientific research campus of the Howard... 


& https://en.wikipedia.org/wiki/Janelia_Research_Campus 


Or any of the other research organizations that Sidney Brenner helped set up ... ? 


He had a sure sense of “how to go about things”. He was one of my heroes when | 
was in molecular biology (briefly before getting seduced by computing), and years 
later | had a very long chat with him in 2004 when we both happened to be at the 
Kyoto Prize festivities. 


€A Alan Kay 
“1 Still trying to learn how to think better - 2y 


Why were nearly all the cases of early computers made of plastics? 


“Early” as in “actually early"? As in 1945 to the 1970s? Or “early” as in “relative to your 
age"? 


“Actually early" were basically metal as far as | know and mostly by direct experience. 
The Control Data 3600 (ca 1963) on which I spent a few years, had a very beautiful 
greenish glass and metal case .... 


€ Alan Kay 
“1 Still trying to learn how to think better - 2y 


What should be learned to deeply understand the 20th and 21st centuries? 


Eric Naughton’s answer is very much pointed in a fruitful direction. 


A more general observation is that most human beings since our species started 
prowling the planet have been content with made up stories about themselves and 


their kind. This is partly because we are genetically wired for being part of a culture — 
we very likely co-evolved via the cultures that existed before us — and a large part of 
this has to do with how we use language and the stories we can fashion from it to 
provide what seem to be explanations about our situation. 


The heart of the paragraph above is Anthropology, some of whose branches are many 
different perspectives on psychology, social behaviors, and most recently “bio- 
behavior", neuroethology, and most recently: “behavioral economics". 


One of the most important processes we do is the opposite of how we generally think 
about perceiving the world: we instead project our beliefs onto the world as part of 
what we think of as perception. This can lead to real trouble, and has many times over 
the millennia. 


There are several hundred additional serious thinking glitches that have been 
identified, all of which indicate that it is a full time job to hold onto an even barely 
accurate view of the world as revealed by the sciences. To pick just one of the 200, 
consider how blind we are almost all scalings ... 


As with careful study and modeling of the heavens overturning our story-view that the 
rest of the universe revolves around us, the more recent studies of our species have 
overturned most of the myths we've made up about ourselves. But most people still 
adhere to their local myths, in part because the study of our species is not only not 
taught to children, in many cases it is against the law. 


The genetics that we have are not much changed from 100,000 years ago and some of 
this helped us survive, and what was against progress was not negative enough to do 
us in (e.g. anger expressed with a rock or a spear would not kill a village). Once scaling 
started to happen and genetics did not change to match up with it, even the scaling of 
one person to a band of warriors could kill a whole village and larger scalings of 
organization and technologies could kill whole territories. 


If the worldviews of people about their situation and about “the other” — their stories 
treated as “normal” and as "reality" — are allowed to remain as they were during 
hunting and gathering small tribe days, but now in a world of vast powers, reach, etc., 
then we have reached a point where much of what helped us 100,000 years ago now 
mostly hurts us, and what was against progress back then is terribly dangerous today. 


These thoughts are not at all new — but they are generally still foreign to most people 
today. In the study of humans, there is little that we do today that is at all surprising in 
the light of what is known — but it is still done, and is still surprising to most. 


On a slightly smaller scale we can see that the tragic responses to the pandemic — 
which has a contagion and prevention pattern that was completely understand long 
before COVID-19 happened — shows just how little of what is actually known about 
us and the world we live in has been successfully taught to and learned by most 
humans over the last few hundreds of years. 


The climate crisis was identified in 1963, has been verified many times, is now an 
actual war in which invasions are continuously happening, and yet the story world of 
most people can still not allow what's actually going on to be thought about. Schools 
like to “teach to the test” — a kind of stupid idea — but the real tests are horrific 
disasters that require very different kinds of thinking than our bare genetics endows 
us with. Unless we can learn the “uncommonsenses” of the sciences and the ways of 
thinking that have to be learned to deal with these ideas, humanity will flunk the real 
tests (and the grade given out will be quite a bit worse than an "F"). 


€ Alan Kay 
1 Still trying to learn how to think better - 2y 


When you say, "the computer revolution hasn't happened yet" do you mean 
that in the sense that, "the printing press hasn't been invented yet" or that, 
“the printing press has been invented but it will take time before a, “literary 
culture" emerges? 


I've done a number of talks and written a few essays on this subject — these will give 
a more expansive answer. 


| usually invoke a number of “thinkers about writing and printing” such as Marshall 
McLuhan, Frank Smith, etc., to set up what | think is a useful analogy. Four of the big 
ideas about writing (and reading) are (a) that many writing systems are powerful 
enough to be able to express important ideas in ways that others can learn to read, 
understand and think about, (b) that these expressions of ideas can travel far and live 
longer than their writers, (c) that the fluent learning of reading and writing causes 
important qualitative changes in human mentalities, and (d) these changes in turn 
raise the quality of thinking and ideas. 


As Frank Smith pointed out, literacy is thus not just about reading and writing but 
about having a literature, and Marshall McLuhan somewhat cryptically but powerfully 
explained that the power of any medium for communication is what you have to 
become in order to fluently use it. 


Writing via hand copies made a difference in “civilizations” but not for most of the 


members. The printing press spread many of the same plus new ideas around in 
profusion. As McLuhan pointed out, the initial subject matter of any new medium is 
attempts to reuse the old: the two best selling books in England after the press for 
almost a century were the Bible and an astrology book. But in the 17th century, about 
150 years after the press was introduced, very new thoughts at a large scale started to 
be written, distributed and read. A good tome about this is “The Printing Press As An 
Agent Of Change" by Elizabeth Eisenstein. 


Two who saw early what printed writing was going to likely be about were Erasmus 
and his friend the Venetian printer and writer Aldus Manutius. They realized that 
books would be inexpensive enough to be portable, and Aldus determined one of the 
sizes of books we have today by measuring saddle-bags in Venice and printing books 
that would fit in them (and inventing smaller fonts that would be very readable, etc). 
They also realized that printed books would allow more involved arguments, and that 
you could refer to what you had said and were going to say if you had page numbers 
in books (the Jewish culture of the annotated Talmuds had already invented these for 
the very same purpose. 


It was the gradual change in the nature and the subject matter of argument that 
brought forth the 17th century (and our modern world). This was the “printing 
revolution”. 


Now, when something like a computer comes along, those who have outlooks like 
Erasmus and Aldus — especially if they know about these two and the history of ideas 
— will immediately see that one way to look at it is as “a book that can read and write 
itself", and that its dynamic nature is well suited to simulation of models, and that this 
will permit “argumentation of a very new sort” about “very important ideas” that will 
likely “lead to new ideas that can only be represented and reasoned about if you have 
a computer”. 


My research community was ARPA-IPTO in the 60s (to which Xerox Parc was added in 
the 70s), and many in it “had the outlooks of Erasmus and Aldus”, and the community 
as a whole did. McLuhan was not unknown, and it was realized that general humanity 
would likely react to the computer as he had pointed out about previous media: by 
trying to install the old content in it and to not look for the new. 


My research community wanted to invent much better ways for humans to think that 
would go far beyond what came forth to that point. And | think we made a good start 
at it, including trying to get the educational institutions to pay attention to the 
differences that were likely to come about. This was especially true at Parc, where we 
invented and made whole systems partly to show much more than demos “to avoid 
100 years of bumbling with more convenient forms of old media as the main focus”. 


The 1980s and 90s saw the commercialization of many of the ARPA-Parc ideas, and 
these almost entirely went as McLuhan would have predicted rather than the way we 
had hoped. To the point that the “qualitatively new forms of argument” and the “new 
literacies” are almost completely confined to science, and engineering (especially the 
former). 


For example, the first good models of an important idea that really needed a 
computer to represent dynamically to argue about concerned the rising CO2 
concentrations in the atmosphere and what these might mean. The earliest warnings 
backed up by good science and needed computations were issued in 1963, and were 
heeded by many in the scientific communities who could see how powerful the 
combination of science and computation could be. 


This has been essentially completely lost to the public to this day, and unfortunately 
including almost all politicians and other “people with power”. 


On the other hand, today billions of people in the world do have personal computers 
(including smartphones) that are millions of times more powerful than needed to 
“argue about climate, pandemics, and many other critical issues and ideas", but 
instead these computers are used as “conveniences for the older media that people 
have gotten used to and will buy”. This is very like the analogy to “the Bible and 
astrology” after the invention of the press. 


This is why | gave talks and wrote essays about “The computer revolution hasn't 
happened yet”. The lag in uptake with the real fruits of the press is quite 
understandable: there was nothing really to use as an analogy. But not to be able to 
look at the computer — given the analogies to the past — can be thought of as one 
of many massive failures of public education, to which can be added the failures 
regarding the climate and the much simpler to understand dynamics of contagious 
incurable diseases, and what should be even easier to learn about: the dynamics of 
scaling itself. 


Alan Kay 

k -Have designed a few programming languages - 2y 

How did they program the first computer knowing that programming a 
computer requires a computer? 


There's writing the program, and there's getting the program into the computer in a 


form that will run it. 


In “the old days” — not just in antiquity, but well into the 60s — most programs were 
written on paper before being punched on cards or typed into into a file. Then a 
program in the computer would read a card or a file, and start converting it into 
machine code of some kind. 


To get started from “dead” there was usually some way to enter a short program in 
binary by hand. This would then read the first card or record on a file or paper tape. 


For example, here is an example in another answer | did for Quora: 


(S) Alan Kay - 2y 
How were computer programs and languages like COBOL 
installed into computers back in the 50s before software? 


There was always “software” for “programmable computers” because this... 


€ Alan Kay 
\ 1 Have designed a few programming languages : 2y 


How much about physics, mechanics, and electronics should a computer 
scientist know about? 


| think it's a very good idea for computerists to have done the work to attain deep 
fluency in a well established scientific discipline (Physics is a good one). Most people 
in computing use the “science” in “computer science” more like “library science” than 
in any modern sense of what “real science” is all about. The lack of this understanding 
has really hurt progress in computing since the big explosion of the field starting in 
the 80s. 


Similar remarks can be made about the “engineering” in “software engineering”. Most 
people who claim they are doing this have very little idea about the levels that “real 
engineering” goes to in the established engineering disciplines. (And these days you 
can't be a “real engineer” without gaining considerable fluency in “real science”.) 


€ Alan Kay 
“<< Lam the Alan Kay in question. - 2y 


How often does Alan Kay find that the real problem is found at a level of 
abstraction that can’t be addressed by individuals, such as ‘the human 
condition’ or ‘societal values’? 


I'm not sure about all of the question, so please comment to explain more. 


Generally speaking the “real problem" is almost always “the human condition” and/or 
“societal values”. 


But these problems can be addressed by both individuals and groups — for example, 
in trying to improve levels of learning and “enlightenment” in the general population 
and also in those who have gotten power in one way or another. 


€ Alan Kay 


Have designed a few programming languages - 2y 


Were all Wirth languages Algol-like (e.g. Pascal), or were any of them Lisp- 
like, or even something completely different? 


A fun question! 


Adriaan Van Wijngaarden was one of the earliest and most colorful computer 
scientists concerned with programming language design. He was on the Algol 
committees, and was a bit of an agitator, and especially in the directions of 
generalization and simplicity. John McCarthy was also on the committee and 
contributed many ideas, many of which were in Lisp, but only a few made it (such as 
conditional expressions). 


A landmark 1962 paper of van Wijngaarden’s — “A Generalization of Algol” — showed 
how you could retain the semantic powers of Algol while making the syntax vastly 
simpler. Much of this was essentially moving Algol towards Lisp. For example, many 
special constructs in Algol, could be rewritten as functions — and even in Algol, 
because it had “call by name” which delayed evaluation of an expression and retained 
the environment the expression would need when evaluated. So, the conditionals and 
looping constructs in Algol could be replaced, etc. 


This paper influenced Wirth when he was a grad student, and his first effort at 
Berkeley was "A Generalization of Algol” (CACM 1963) — not as general as Van 
Wijngaarden’s, but "Wirthian” in erring on behalf of the implementor rather than the 
programmer. 


Wirth's next effort (with Helmut Weber) was to try to make a complete higher level 
language generator whose result would have the power of Algol (or more). This was 
“Euler” (CACM Jan Feb 1966), and I think it was Klaus’ best work. 


This was “going towards Lisp from the Algol side”. Also going on in the 60s was “Lisp 
2", which was an attempt to “take Lisp towards Algol" (this was much later done really 
well at the Stanford Al Project as M-Lisp (David Canfield Smith), and Tony Hearn’s 
REDUCE). 


And also going on in the 60s was the work of Strachey, and especially Peter Landin, 
who showed how to really use lambda calculus to model the semantics of 
programming languages, and especially Algol. (These were very rich times!) 


The basic idea here was that there was no reason not to have all cakes and be able to 
eat all of them. The trick was to deal with how expressiveness traded off against 
extreme generality. 


The syntax “compiler-compiler” of Euler was the weakest part, but it could easily be 
done in other ways, and especially could be done in Euler itself. 


The virtual machine for Euler was modeled on the Burroughs B5000 (and would be 
called “byte-coded” today). Weber got this going in the somewhat forgiving 
microcode of one of the smaller IBM 360s (which used microcode to emulate 1401s, 
etc.). 


| was very excited by this — it was pretty darn clean and simple — and | used as many 
ideas as | could for the FLEX Machine that Ed Cheadle and | made. Flex was kind of an 
object-oriented Euler, with several different kinds of simple translators, written in Flex. 


These techniques were later used for Smalltalk at Xerox Parc. 


Klaus liked to be “practical” so he started to abandon these ideas for less wonderful 
ones. 


But a great end to this story is that he eventually wound up visiting at Xerox Parc, saw 
how we dealt with “practicality” by making machines suited for higher levels, and 
decided to do one of these himself. This was the genesis of “Lilith” and eventually the 
“Oberon” system. 


€ Alan Kay 
{Still trying to learn how to think better - 2y 


Should writing code be fun? 


| wrote this in a comment to lan Joyner’s answer. Here it is as an answer. lan used a 
great quote by Al Perlis (the first Turing Award winner) about “Computer Science”: 


“| think that it's extraordinarily important that we in computer science keep fun in 
computing. When it started out, it was an awful lot of fun. Of course, the paying 
customers got shafted every now and then, and after a while we began to take their 
complaints seriously. We began to feel as if we really were responsible for the 
successful, error-free perfect use of these machines. | don't think we are. | think 
we're responsible for stretching them, setting them off in new directions, and 
keeping fun in the house. | hope the field of computer science never loses its sense 
of fun." ~ Alan Perlis 


Bringing Al Perlis’ comment up to the present time, where "Computer Science” is kind 
of a sloppy term for “pretty much everything”, instead of the very strict sense of 
"Science" (as in “Real Science") that he had in mind ... 


Note that in the larger world of highly developed disciplines, we want the airplane we 
are flying in to be designed by *serious engineers*, whose version of the Hippocratic 
Oath is “The bridge must not fall, the airplane must not crash ...", rather than scientists 
or mathematicians who will generally learn more from a crash — and from playing 
around in general — than when everything goes well. 


The “fun” of engineering — and boy it really is fun to design and make things that 
really work well and help humanity along — is very different from the "fun" of science 
and math. Having “fun” is very much being able to do what you choose to do really 
well. 


Alan Kay 
Have designed a few programming languages - 2y 


When, and in what degree was today's level of abstraction in programming 
predicted? 


If you explain just what you mean by “abstraction” here (and “level”, | will be happy to 
attempt an answer. 


€ Alan Kay 


~ Invented and designed a few user interfaces ... - 2y 


Have you ever studied optical illusions to use similar principles in your UX 
designs? 

| have studied them quite a bit — as well as many books by research psychologists, 
including by Richard Gregory, Roger Shepard, etc. But not to use the principles in the 
UI designs per se, but as one of the most available routes to understand and test the 


many ways “our minds don’t work very well” — or to paraphrase Francis Bacon, "We 
humans have bad brains”. 


| did a lot of theater in my early 20s and was also studying Anthropology as a minor 
concentration (or as Jerry Bruner would put it “cultural psychology”). A lot of what is 
done both on the stage and with music is a kind of magic act where your designs and 
portrayals are encouraging the audience to use their brains and their imagination on 
behalf of what you would like them to think they see and hear and feel. 


When | started to invent and design Uls in the ARPA/Parc community, | realized that 
one very powerful way to make progress was to consider the human-computer 
relationship as “theatrical”, and to create an environment that would harness the 
intelligence and instincts to explore of the human user. 


| called what we were designing: “the user illusion”, and | think that term is still in use. 


€ Alan Kay 
<1 Have designed a few programming languages - 2y 


I searched YouTube for Alan Kay with Donald Knuth and found nothing. 
How does Alan Kay understand the essence of literate programming? Is it 
to design code like you would have to teach it? 


Don and | have been friends for more than 50 years now. The simple summary of 
“literate programming” is that "Don is amazing", both in “many ways” and in how he 
goes about programming, and what he thinks a program *is*. 


He wrote a nice short paper and example for a book about programming by Doug 
McElroy, that got many prominent programmers to write a program for a problem — | 
think it was text justification or some such — and also comment about it. 


A much longer, and very illuminating example, is the “B” book for TEX. 


Don thinks of the form of a program as kind of essay that has to be as 
comprehensible as possible to both humans and computers. The burden of this is on 
the human programmer, who has a set of tools to help create the composite 
document — this includes an “extractor” that extracts and stitches together the 
programming language text in a number of ways. 


He has generally not been terribly interested in VHLLs, in part — | think — because a 
well explained MHLL is pretty darn understandable in considerable detail (at some 
point the “wealth” of detail starts to become a tax). 


Don started programming in the late 50s (when many of us old timers did), and the 
algorithms and data structures styles of those days produced programs that were 
somewhat “narrative” in structure, and Don's "Web" tools for “literate programming” 
facilitate narratives. 


As one moves from procedures and data structures to systems, especially at scale, the 
narrative structure quickly loses much of its benefits (you can tell the story of a few 
people in an apartment building in New York City, but trying the describe the city 
requires a very different kind of approach). 


There have been experiments with organizing “narrative threads” through system 
descriptions, none of which I'm aware of quite delivered enough value. Since we 
humans are organized around stories and narrative, finding ways to bridge this gap 
for systems would not only “be nice”, but | think it has to be a critical part of making 
further progress, 


€ Alan Kay 


‘1 Have designed a few programming languages - 2y 


Is it a good way to get an indirect tip on an invention by looking at a 
software program running, that is morphing into different shapes? If easier 
to get ideas this way than language, does that prove our language is too 
simple and must be developed? 

| don't quite understand the first part of your question, but as to the second part, 
there is quite a bit of evidence that we have what one of the inventors of cognitive 
psychology — Jerome Bruner — called “multiple ways of learning and knowing and 
remembering and thinking”. 


He picked three of them to be deeply studied. These were 


= 


. "Enactive” — how we learn and know and remember and think by “doing”, 
“touching”, etc. 


Nm 


. “Iconic” — how we learn and know and remember and think via 
“configurations” in a kind of spatial context (includes visual and auditory). 


Ww 


. “Symbolic” — how we learn and know and remember and think using 
symbols of various kinds to stand for ideas 


Some of the previous work along these lines had been done by Piaget, and then by 
the mathematician Hadamard (in “The Psychology of Invention in the Mathematical 


Field”, a survey of the top mathematicians in the world ca 1945). 


| used this “multiple mentalities” idea as part of the foundation of the Xerox Parc GUI 
design, and it has many implications for curriculum design as well (see Bruner’s books, 
especially those from the 60s). 


I think the answer to your question in this light is that there are many dimensions on 
which to find ideas, and it's a good idea to use as many as possible. 


é Alan Kay 
1 Still trying to learn how to think better - 2y 


What are the differences between social science, natural science, and 
humanities? 


Several good answers here already, so this will attempt to add more perspective. 


Our genetically created brains are set up to be social, and have propensities for 
learning language and our surrounding culture, including *stories* — our dominant 
way to remember, communicate, and make sense of things. All this activity goes on in 
our *minds*, which are the processes created by our brains. 


Even though we are only “thinking in between our ears”, our minds perform the trick 
of *projection*, which seems to put the part of our consciousness that is modeling the 
outside world out into the world (so we think we see and touch “what's out there” 
rather than having all the action take place internally ). 


This is easy to reveal e.g. through some “optical illusions”, such as “size constancy” in 
which comparing (say) two oranges or poker chips, one twice as far away as the other 
— which show up on our retina as one half the size of the other — conflicts with our 
*knowing* — meaning “believing” — that they are both the same size. 


The resulting “story” that is made up from the direct phenomena and what we believe, 
is often a compromise — in this case we “see” the twice as far away object as about 
80% the size of the closer one, rather than 50%. In other cases, we may only “see” 
what we believe, and not what is there at all. 


One way to think about this is that we are in a kind of hallucination — a “waking 
dream” — even when we think we are “fully awake”. Another way to think about this is 
why “stories”, and especially “theater”, work with us. 


And, we should be able to see why it took modern physical humans — 50,000 to 
100,000 years on the planet — to finally invent real science in a way that stuck only 
about 400 years ago. This long delay is because we are basically satisfied with good 
stories as explanations, and our projections tend to make us see our stories out in the 
world. This is how we have existed (and how most people exist today). 


The story framework is deep within us, so what has basically happened with regard to 
“humanities, science, social science” is a progressive “being more careful with stories” 
(for some stories) over the last 2500 years. Both logic and its powerful offspring 
mathematics, enforce much more internal consistency; in addition, science requires 
considerable work with learning how to gather phenomena “pre-story” in order to 
start to create maps into our language apparatus that will allow “careful stories” to be 
reasoned out. This progression also requires progressively more effort, both in 
learning, and in delving. 


The result today is that still only a relatively small percentage of humanity starts 
learning how to move from the quite magical unsophisticated worlds of our genetics 
and the traditional cultures which have co-evolved together. 


Within these efforts, all the different ways to try to think about us-and-our-thinking 
are extremely valuable, and not at all exclusive. The human tendency to desire “one 
true way to understand” does cause overshooting of the methods in each of the areas. 


For example, we find over the years divisions of the humanities making claims about 
parts of the universe where no good methods exist, and we find this in turn also for 
mathematics, and for science, etc. With regard to the latter two, Einstein said this in a 
famous lecture in 1921: “As far as the laws of mathematics refer to reality, they are 
not certain; and as far as they are certain, they do not refer to reality.” In other words, 
quite a bit of care is still required here, especially when mathematics is being used as 
one of the languages to represent models in science. 


A good rule of thumb in thinking is “The softer the area, the tougher you have to be”. 
People had to get somewhat tough to break out of the 10s of 1000s of years of story 
explanations, but once they did, the physical universe started to help, especially, early 
on in Physics, by being pretty nicely consistent with respect to careful observations, 
and stubbornly resistant to our own beliefs. Chemistry required a lot more work, and it 
took Biology all the way into the 1950s to come up with a complete enough 
fundamental theory and methods. 


Part of the nature of the physical sciences is the line often heard in first year physics 
“Assume a spherical cow" (for calculating air resistance and terminal velocities, etc.). In 
other words, approximate methods work surprisingly well in the physical sciences, 


partly because much of the epistemology of science is based on admitting the 
existence of many kinds of errors — both in measurement and our noisy thinking — 
and inventing larger methods that do not allow the errors to obscure some of what 
nature seems to be doing. (There are important analogies here to the much simpler 
processes used in every communications and storage and computing system today to 
“embrace the errors that happen” with methods which restore the original intentions.) 


Where the physical sciences try to escape from our “waking hallucination”, it is not a 
large exaggeration to say that the centers of the social sciences and humanities are 
devoted to dealing with us and our quirks directly (though most would not define 
themselves quite this way). The field of “bio-behavior" is currently a rather small 
overlap between all three areas that is now slowly starting to grow (there are other 
names for related fields) where the object of study is “scientifically why” we humans 
behave as we do. 


The “softer nature of this nature” has required real toughness on the part of 
anthropologists, psychologies, etc., to somehow “be scientific’, partly by trying to 
prevent beliefs and desires about human beings to creep into “conclusions” and 
“results”. This is very difficult, and claims have to be treated with care. Still, this is 
arguably the most important area of human knowledge to try to improve and grow. 


An interesting deep area of study is how emotions are intertwined with our 
communications and beliefs. Quite a bit of what we consider “fundamental meanings 
of existence” are wrapped in how we feel — beyond what we conclude — and it is an 
interesting question whether the gradually more sophisticated understanding of how 
emotions work, produce enough illumination where it is most important. 


Music is a very good example. At its heart it is about feelings and alternative ways to 
express them. A famous quote is: “Talking about music is like dancing about 
architecture”. And to wind up where we started — with a story — the 19th century 
composer Robert Schumann was asked after playing a new piece what it was about. 
He thought for a second, sat down at the piano, and played it again. 


é Alan Kay 
“7 Invented and designed a few user interfaces ... - 2y 


Are there any areas apart from user interfaces in which the Model - View - 
Controller (MVC) pattern could be applied in spirit? 
Sure. 


| wrote a paper in the late 70s or early 80s dealing with the idea that “everything” — 
and more — in the kind of object-oriented programming we were doing could be re- 
created in very useful ways by generalizing “viewing”. This included making objects, 
inheritance, delegation, etc. 


| don't think | have a copy of this paper, but it was for an “Infotech State Of The Art 
Conference”. 


It's worth thinking about what the terms could actually mean as abstractly as you can 


€ Alan Kay 

<1 Have designed a few programming languages - Updated 2y 

Will plain English ever be a programming language, as in Star Trek? 

My boss at the National Center for Atmospheric Research in the 60s — Glenn Lewis — 
in the 50s was a Princeton grad student of von Neumann's, who — among his many 
personality idiosyncrasies — was famous for pooh-poohing “higher level languages”. 


A great story from Glenn: one time when asked about this, von Neumann summoned 
a grad student and said “write me a sort program”. The grad student went off, and 
von Neumann said “The grad student will either come back with a sort program or he 
won't come back at all — that’s ‘higher level language programming’ !" 


This should also be all that is needed to figure out the answer to your question. 


Alan Kay 
~ [am the Alan Kay in question. - 2y 


What does Alan Kay think about the paperclip maximizer thought 
experiment? 


Of course, this is a very old idea in classic science fiction — that Ais, robots, etc., could 
destroy humans if that would help some goal of theirs. 


But consider e.g. Nazi Germany as “intelligence destroying humans as part of goals 
considered more important”. 


Or consider e.g. the climate crisis of today as “intelligence destroying humans as part 
of goals considered more important”. 


An “intelligence” doesn't need to be “artificial” in order to be a deep existential threat. 


é Alan Kay 
k 7 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 2y 


Why do we use digital computers for machine learning, Al applications, and 
natural language and speech processing, if our brains are analog not 
digital, would it be better if we use analog Computing to mimic our brains? 


“Digital” is “Analog handled carefully”. Quite a bit of work in design and technology 
has to be done to make the noisy voltage levels (or any other means of transmitting a 
signal) be interpreted as discrete values. 


A high percentage of the bits on CDROMs and DVDs are there to provide various 
kinds of redundancies to protect the “information bits” and themselves (the 
“protection bits”). This has to be combined with processing to recover the original 
intentions. 


One of the reason for “clocks” in digital computers is to try to isolate the times when a 
bit can be interpreted with very low error from the times when its value is not easy to 
determine. 


So we can take the analog world and design it to be discrete. But we can take what's 
on offer in the discrete world and use it to simulate analog to levels far beyond the 
signal to noise of analog. This is what is done in the careful sampling of audio in both 
time and amplitude. 


And we can also make hybrid devices that use both. For example, the first ever optical 
mouse was done by Dick Lyon at Parc using a custom chip he designed and made 
which had an analog section for rapid computing where perfect accuracy was not 
important, and it had a digital section where things needed to be more nailed down (I 
think there are some Machine Learning chips that use analog values because the ML 
schemes are correlative and don't require perfect accuracy). 


€ Alan Kay 


Still trying to learn how to think better - Updated 2y 


What was the first personal computer to use the GUI? 


It is really annoying to have “someone” change a question after it has been answered 
— especially if the answer no longer fits the new question. That is what happened 
here. The original question asked about “personal computers” and my answer is 
below. Now the question says “first computer”. 


| take a GUI to include being able to point at the screen for both manipulating 
graphics and to issue commands/requests. 


Sketchpad in 1962 definitely had a GUI, and it was implemented on the TX-2 
computer at Lincoln Labs (this was a very large research computing for the SAGE 
system). Most commands were still given using separate physical buttons. 


Graphical outputs were possible — and used for games — on both Manchester 
machines and on the EDSAC, all from either side of 1950. But I'm pretty sure that they 
didn't have pointing devices. 


The “light gun” — for pointing — was invented by Bob Everett for the MIT Whirlwind 
computer in the early 50s, and the combination of line drawing CRT and light gun was 
used in “Whirlwind II" — which was the SAGE system — and there it was used to point 
to displays on the screen to indicate which plane was of interest. | think that this was 
the use of pointing for a parameter, but that the commands were entirely given by 
physical buttons. 


| don't know of the extent of possible GUIs on the earlier Whirlwind computer — it's 
possible: they did many other interesting first things with interaction. 


So: certainly Sketchpad in 1962 on TX-2. 


That's a good question (some people require the “personal computer” to have been 
commercially available). 


But | think this is like requiring Engelbart or the RAND corporation to have been in 
business to have been important parts of the history of GUls. 


There were “graphical user interfaces” before the Parc Alto, but they were on much 
larger machines — the experience was “personal computing” but there was not a 
“personal computer”. 


The Xerox Parc Alto was a personal computer, and it was available outside of Xerox. 
Several thousand were made and quite a few were donated to universities and to the 
US government. 


If we think of “GUI" as meaning overlapping windows, icons, pointing devices, bit- 
mapped screens, etc., then | think choosing the "Xerox Parc Alto” as the answer to the 
question would be quite accurate. 


€ Alan Kay 


| Have designed a few programming languages - 2y 


| heard today that there are some advantages of using a formal language to 
write a software specification. What are they? 


All computer languages are “formal languages”, so we want something that will allow 
us to express the meaning of what we want without pragmatics (unless part of the 
meaning has requirements for certain pragmatics). We'd like it to be as clear as 
possible, and we'd like to be able to debug it as well (a spec that is not debuggable 
might not actually represent what is desired). 


For an example, let's look at sorting, and use a formal language that is like Prolog with 
amore inutitive syntax. Sorting can be defined as: 


sort A is B if B is-permutation-of A andif B is-ordered. 


Let's suppose our language already has definitions for “permutation” and “ordered”. In 
a Prolog-like language this will both check A and B, and also will try to generate B if it 
is undefined (this is not pretty, but it will do it). 


So, our little specification language is both providing a definition, and a semantic 
check on what we intend by “sort”, and it is also providing a running debuggable 
simulation of “sort” that can be used in a larger spec/prototype. 


(Note that the makers of the web and Quora have decided that | can't run and test 
this ...) 


The next steps would be to — separately — start writing “sorting cases" for the many 
different ways we know how to do sorting. These cases could be headed by the 
conditions that would select which method to use, e.g. given the size of a particular 
set, and/or the longevity of the set, or the frequency of resorting, etc. 


We can see that a really good “spec"/requirements” language would make a big 
positive difference if we could get a running prototype out of it. This might require a 
supercomputer, but this is standard practice in other branches of engineering to do as 
much work with the semantics of models to get the design as intended. 


In practice, not enough smart people have worked on this problem over the years, and 
the funders never got interested enough in it. | think it should be one of the “grand 
challenges” of the next few years in Software Engineering. 


€ Alan Kay 
‘1 Still trying to learn how to think better - 2y 


Were Xerox PARC systems like the Star robust for third-party application 
development? 


As the question is phrased, the answer would be “yes”. 


However, at least in the early and main phases of the Star (and I'm guessing: all) there 
was no intention that any other than Xerox would do applications for the Star. 


The robustness came from the ways the Star, its OS, its Ul and its programming 
language were designed and built. In a word: “principled”. 


My research group and | were not a part of the Star development (although the UI 
design “notions” for Star were first suggested and prototyped in Smalltalk by David 
Canfield Smith). | was friends with many of the Star designers, and got to witness 
various stages of this process. 


(I should mention here that | didn’t agree with the general approach that was taken 
for the Star — both HW and SW — but since | was mostly interested in getting on 
to personal computers for everyone, especially children, | didn’t have an emotional 
stake in the Star.) 


But the deep — and mostly successful — attempt to be “principled” about everything 
was interesting to watch, and it produced some really thoughtfully worked out 
documents. 


| thought the most effective and important ones were about the Star UI, and some of 
these might be available. There is a good retrospective document by Bill Verplank that 
is worth reading. 


| don't think there was any connection, but the later “Macintosh User Interface 
Guidelines" document — mostly by Chris Espinosa (I think) — was a really good set of 
principles about the Mac UI and what developers *must* do to create a "MacApp” 
(and there was a development system of the same name that helped developers make 
Uls for their products that would be in accord with the guidelines). Chris Espinosa was 
also a great bulldog and knee shooter — a kind of hero, actually — who made an 
enormous difference to how Macs looked and felt in their early days. 


Back to the question, the Star OS, Language, and IDE were all heavy duty designs done 
by experts, see: 


Xerox Star - Wikipedia 


Early GUI-based computer workstation from Xerox Xerox Star workstation... 


© https://en.wikipedia.org/wiki/Xerox_Star 


There were several software “religions” (or at least belief systems or “paradigms”) at 
Parc, and the Star system was done using a different one than the one my research 
group had been evolving. | thought then (and now) that our lighter weight, much 
smaller, “fast enough", and still very safe approach would have worked out better. For 
example, if you wind up swapping pages much of the time because your system 
wound up “large”, it doesn't matter how fast your inner loops are ... 


But we will never know. 


€ Alan Kay 
“1 Have designed a few programming languages - 2y 


How were multi-user programs (e.g., MUDs) implemented on time-share 
systems? Shared memory? IPC? 


Every which way ... 


In the sixties there was an attempt to protect large processes from each other using 
base-bounds registers (which were quite often merged with the hardware used for 
mapping swappable portions of a virtual memory). 


For example, on the SDS-940 that Engelbart’s group used for “the mother of all 
demos” the “Genie OS" and the hardware mapped 64K 24-bit word virtual memory for 
each process as pages of 2K words each. This OS did have a pretty good process-to- 
process message scheme, so processes could communicate. But also many kinds of 
unprotected threading was also used within these protection boundaries (they did 


everything that could possible help their goals — and they were good enough to 
survive the risks they took). 


Basically, most systems were a combination of "good" and "bad" ways to do things. 


An interesting exception — and a very early one — was the Burroughs B5000, which 
used very different schemes in HW and SW from all the rest, and "in general” every 
part of their systems was much cleaner, safer, more efficient, etc. 


Finally, certain programming language/systems, including several at Parc, were 
designed successfully to be highly safe, and thus were able to do in software (usually 
microcode and the VHLLs) what normally would require specific HW to accomplish. 


€ Alan Kay 
<1 Still trying to learn how to think better « 2y 


Does abstraction in computer science use mathematics as a basis? 


“Mathematics” doesn't have a crisp definition, and its plural form has admitted many 
new thinking schemes over the years. 


| think we are doing mathematics whenever we are making a concerted effort to relate 
organizations of ideas to each other (von Neumann called mathematics “relationships 
about relationships’). 


From this point of view “abstractions in computer science” *are* a form of 
mathematics. 


If “mathematics” in the question is supposed to mean “already existing forms of 
mathematics”, then the answer is also yes — and ranges from the use of logic*, to 
schemes to represent numbers and operations, correspondences that are both 
functional, “functional for a particular time”, and “quasi-functional”, etc., to systems 
organizations and notions about process. 


* When | was learning “mathematics” more than 70 years ago, many logicians held 
that Logic was a separate field and earlier efforts in the century had been made to 
“get Mathematics from Logic”. This division could still be the case (but in my view they 
are all part of the same set of ideas and ways to approach things). 


€ Alan Kay 
1 Have designed a few programming languages - 2y 


How does the brain interpret computer languages? 


There is not much definitive understanding about how the brain does most things, 
including traditional languages, mathematical languages, music, computer languages, 
etc. 


It's worth noting that virtually all the studies of fluency in any of these have found that 
what the brain/mind knows is strongly projected onto sense impressions (a) in order 
to recognize them at all, and (b) to make many instant predictions about what else is 
likely to be true. For example, when a numerate person is shown a number, they don't 
just recognize it as a number, or see what the number signifies, but they also “project 
out” the number into many perspectives, including magnitudes, possible factorings, 
relationships with other numbers, etc. This is also the case with music, and with 
computer languages. 


In all cases, the “projecting out” that manifests “possibilities” is a critical part of how 
we understand. 


In a logic system, this would be called “forward inferencing” or “aggressive 
inferencing”. It puts many more things in play than were in the the original stimuli and 
offered knowledge. 


é Alan Kay 
*_ 1 Invented and designed a few user interfaces ... - Updated 2y 


. Do you agree with Myers and Rosson's statement that ‘nearly half of entire 
software development effort relates to the user interface’? Defend your 
answer. 


This sounds like a homework assignment. 


If you were to read their paper, you'd see that it was written 30 years ago in 1992. 
Practically the first thing that Brad Myers complains about is that the previous survey 
dated back to 1978, and there was a need for one that was up to date! 


And to give you more of an idea about then vs today, he said they got quite a wide 
response to their survey (it was all of about 70 responses!) 


| think this is an excellent paper, but for the purpose of the question, you need to find 
a much more recent claim so you can ask the question more accurately. 


Alan Kay 
Invented and designed a few user interfaces ... - 2y 


@ 


When Alan Kay invented the user interface (screwdriver metaphor), had he 
in mind disabled people too? 


| didn't invent the user interface — or the graphical user interface — many of the basic 
ideas were first invented and implemented in the 50s, and especially in the first half of 
the 60s — and quite a few of these were done by DoD supported research, especially 
by ARPA. 


Even the early ideas had considerable range, including ideas that were like "hand- 
tools" and those that were like “people as helpers”. These included the use of Al in 
various ways. John McCarthy's “Advice Taker” in 1958 was an Al “interface agent” that 
could interact in commonsense terms. The notion of “an Oliver” (named for Oliver 
Selfridge) as a kind of super Siri was a mid-60s scenario. Hands free Uls were part of 
the larger design space. 


An early Braille Ul for a disabled person was for Ted Glazer at MIT, who was physically 
blind but mentally had great sight (he designed the displays for DEC and was a power 
in programming languages back then). 


My father was a physiologist whose central interest was both the design and 
psychology of prosthetics, first for veterans, and then for Thalidomide babies. This was 
a background theme for me growing up. 


“Personal computing” was a notion within ARPA that predated “personal computers”. | 
got interested in the latter because of the possibilities for rich and quick interactions, 
and — as a musician — started to use playing as a metaphor: “a computer is an 
instrument whose music is ideas”. 


| was working on a desktop personal computer design in the last part of the 60s when 
| met Seymour Papert and saw what he was doing with children and “powerful ideas”. 
This flipped my perspective about most things, and | started to think about “children’s 
computing” and children’s computers and user interfaces. 


One of the flips was from “tools” and “vehicles” and “agents” to “media” as a larger 
way to think of the actual powers that were needed and furnished. Another flip was 
from a UI metaphore of “access to function and power" (like the control room of a 
nuclear submarine) to “understanding and learning” (maybe more like a Montessori 
school). 


The Parc GUI came out of those and other flips. 


We also were able to get one of the truly great eye-trackers of the day (done at SRI) 
and used it for many experiments. Sam Gebner at MIT (a student of Minksky’s) had 
done a terrific thesis using eye-tracking for many I/O tasks, including pointing, and we 
all expected that GUIs would include eye-tracking. Nicholas Negroponte was another 
strong proponent of this. 


Today, it is really a shame — and worse — that with tiny cameras in so many personal 
computers and phones, very little has been done to do eye- and pupil dilation- 
tracking. One corneal reflection (using IR) is good enough for about 2°-3°, and this is 
about the size of a word on a screen. 


However, the general art of UI itself has diminished gradually and now greatly as the 
commercialization of personal computing “progressed”, starting in the 80s. 


€ Alan Kay 

1 Still trying to learn how to think better - Updated 2y 

How do computers work? | am 16 and | love programming, but | feel like | 
should know what really happens on the inside, the physics of it. 


I've answered various parts of this question elsewhere in Quora. Try this one to get 
started. You will see that the fundamental building blocks for making a computer can 
be made in many ways, and often from completely common materials. 


Alan Kay's answer to How do computers work the way they do? When does electricity 
become executable logic and how? 


Check back here every few days for a few more references that will help. 


Alan Kay 
Still trying to learn how to think better - Updated 2y 


How do computers work the way they do? When does electricity become 
executable logic and how? 


Many of the answers here have more elements than needed — for example, 
transistors and how they work are not required. 


One way to think about digital programmable computers is that they can be 
completely made from “switchable switches”, that is: switches that can be switched by 
signals to produce signals that can switch switches. 


Electricity is not needed. For example, switchable switches — and thus a 
programmable computer — can be made from tinkertoys, ropes and pulleys, water or 
gas flow, etc. 


<0 


sreiwg 
(NPur A 
— 

—— ( ( (0) ourrur 
—_ 
taeur & 

> 

ourrur 


> 


iNPaT- 
Tinkertoy logic. Signals are “push” or “no-push” of a rod. 


The top construction will transmit a “push” if “either or both” of “INPUT A” or 
“INPUT B” are a “push” — this is the logical operation “OR”. 


The bottom construction will push if the input is NOT a push, and will NOT push 
if the input is a push. This is the logical operation “NOT”. 


These two operations are sufficient to make any computer. For example, here is a 
TIC TAC TOE playing tinkertoy computer made by Danny Hillis and Brian 
Silverman. 


Electricity is convenient in many ways. One way to get electricity to cause something 
to happen is by making electromagnets (a battery, nail, and insulated wire are all that 
is needed). 


Making an Electromagnet. Up to a point, a battery can be thought of something 
like a canister vacuum cleaner that produces high pressure at one end (the top) 
and low pressure at the bottom. Wires can be thought of as tubes. Charge can be 
thought of like particles. The tube on top is at pressure, and the particles will 
flow to the low pressure tube on the bottom if a tube is used to connect them. 
This flow can cause work to happen. 


There's more to electricity than this, but much can be visualized just by keeping 
the flow of charge in mind. “Voltage” (or “tension”) is analogous to “pressure”, 
and “current” is analogous to particle flow. 


+ 
; 
+ 
‘ 
SO OOH OO OO OS =asanee 


When charge is able to flow through a wire, an electromagnetic field is 
produced, and this can be intensified by a magnetic metal such as an iron nail. 


We can make a switchable switch by having an electromagnet pull on a physical 
switch to turn on or off current in a further set of wires. 


iron nail 


Ps 


coil of 
insulated 
wire 


This all that is needed. Note that many things can be done just with these ingredients, 
and especially, all that is necessary to make any computer. 


Here, the output is the same as the input, and this “relay"ing of the signal (using a 
fresh source of current) was how this was originally used in telegraphs in the 19th 


century. 


But we could organize the spring the other way, and this would send the “negation” 
(the opposite, the NOT) of the input. (Below we'll use NOT in a very interesting way.) 


We could organize two of the bottom elements so both electomagnets need to pull 
their switches down to transmit a signal — this would be the AND logic operation. 


And, on the right is shown how to organize things so that either OR both inputs will 
turn on the light. 


Do you see how these work? 


The Syd wl ently ge oe a) | ta 
ot ot | 
The cntqnd i cated The owed 8 ated 
sows awe 


fi 


How to make the AND and OR operations using electricity and relays. 


A good way to think about this kind of “active logic” is that all the operations are 
*comparisons* of the inputs, and the “logic operation result” is delayed a bit from the 
comparison. 


Next is an example that uses "NOT" to make a door buzzer (and if we squint at it, we 
can also see that this is a ticking clock). This is worth pondering for a bit. 


The left side shows the wiring and the state with the button/switch “open"/"off’/"up". 


On the right side we see what happens with the button “closed"/"on"/"down". The 
action of the electromagnet pulls the switch open/off which interrupts the current, this 
lets the spring pull the switch back closed/on which gets the electromagnet to pull the 
switch open again. 


We will hear a buzz. The light will be flashing on and off. And the output will be a train 
of on-off signals (that we can use as a “clock” in making our computer). 


Oo 0 eat BA Pee Rs Oem mane 


feoteny be pat Be owt Lamb by chewed oe 


ee eet ete Oe Oe Oe 


IWwovuna 


Tee an veteb oS bess honk amd her ee 
faght el Arh oe come ll ce ee ne 
Oe ee 


l- 
some 
¢ —— 
+++ +o oe oe oe ee ee oe oe ee oe 
Note that this “logic” is paradoxical in standard mathematical logic 

x = NOT x 

but makes sense if we add the time delay needed in the physical world 
x(t+1) = NOT x(t) 


In fact, on a computer, it is this operation that *defines time* for the computer, and 
what “before” and “after” mean. 


The combination of NOT and AND can make the NAND operation, and can be a 
universal building block for any computer. 


Because the logic is independent of the materials, most computer designs are 
expressed just using logic symbols. 


Some early computers were made from relays, and they were pretty reliable — but 
were bulky, and required a fair amount of power. Vacuum tubes could do the same 
switching operation and were a lot faster, but had their own idiosyncrasies. Transistors 


were smaller and low powered, and especially could be fabricated as circuits on silicon 
wafers. 


If you are interested in this, there are a host of books and other online material, 
including interactive building kits. 


€ Alan Kay 

Have designed a few programming languages - 2y 

What would happen if a computer was made for two users, with a monitor 
in common and two keyboards? Would it make programming more 
sociable? 


To just pick two gems from the past, take a look at Doug Engelbart and NLS, and 
Horishi Ishii, especially his work at NTT and “Clearpane’”. 


€ Alan Kay 
Have designed a few programming languages - 2y 


If advanced software is technically just made up of lots of really simple 
components, why can't we replicate them using Al or no-code platforms? 
Living things are made mostly from 6 simple atoms plus a few trace elements. What's 
“interesting” is the unusual organization of them. This is also the main difficulty with 
software, whether written by humans or other software (i.e. “Al’). 


€ Alan Kay 
Still trying to learn how to think better - Updated 2y 


How is an Operating System run on Transistors? Like how is software put on 
it and getting a graphical output from all of this? 

Originally Answered: How is an Operating System ran on Transistors? Like how is software put on it 
and getting a graphical output from all of this? 

I've answered much of this question already on Quora. I've put a few of these links 
below. 


The way to start understanding this is to first see - A - how patterns can be 
represented by various kinds of “marks”, including in electrical devices such as 
switches, relays, etc. You don’t need transistors for this (they are just convenient). 


Second, is to see - B - how some electrical devices can receive signals, compare them, 
and do one thing or another depending on the signal. 


These two ideas can be combined into a system that has an “A Part” that will hold 
marks that represent both “materials” (such as numbers) and “directions” (called 
programs) that can be interpreted by a “B Part” that will manipulate the “A Unit” 
(including both the “materials” and the “directions”. 


An “operating system” is just a collection of general programs that are written to help 
programmers write new programs. “Graphical output" is represented using exactly the 
same kinds of markings used in the “A Part”, that are generally interpreted as a color 
for each dot on your screen (so there are a lot of these). 


For example. here is an actual “portrait” of the RAM memory of the Xerox Parc Alto. It 
had 65K 16 bit words (128K bytes, ~ 1 million bits). A little less than half the memory is 
used for the 808x606 display with one bit per pixel, and a little more than one half is 
used for everything else (in this case Smalltalk, which combined an operating system, 
gui, interactive development system, and user media and other tools, including text, 
drawing, animation, music, and desktop publishing, etc.) 


The display part of the memory looks just like what is put on the screen (part of the 
Alto hardware scans this continuously and makes the CRT display light up where there 
is an “on" bit), and the rest shows somewhat random patterns characteristic of what 
SW looks like as bits. 


Alan Kay's answer to How was the first computer operating system made if there were 


no programming languages? 


Alan Kay's answer to How do computers deal with images, written texts, videos, etc.? 
Does it transform them into 0 and 1 codes? 


Alan Kay's answer to How were computer programs and languages like COBOL 
installed into computers back in the 50s before software? 


é Alan Kay 
“+ Still trying to learn how to think better - 2y 
What is the revolution of computers? 


Questions similar to this have been asked a few times (I've already answered two of 
them — take a look). And here's one of the answers: Alan Kay's answer to At OOPSLA 
1997 Alan Kay gave a talk titled "The computer revolution hasn't happened yet". What 
parts have materialized thus far and if not why not? 


€ _) Alan Kay 
~| Have designed a few programming languages - 2y 


What is a program, and why do computers need programs? 


A "computer" is any system that manipulates information, most often to help generate 
inferences. It was first used to refer to people who did this, especially in the case 
where the information was numeric and the inferences were arithmetical. 


Some of the earliest added tools were in the form of tally sticks, and eventually many 
variations of the abacus, in which the information is organized in the form of markers 
of some kind that conveniently represent numbers for arithmetical manipulations. 


The human part of this system is taught how to manipulate the tokens to e.g. add or 
subtract numbers. These added schemes of "do this, then do that, compare this to 
that, and if one way do this, etc.” can very well be called “programs”. 


This idea of having a general piece of machinery to which “instructions” of different 
kinds can be added to produce different behaviors was eventually put entirely into 
machinery. One of the earliest reported is the programmable rolling toys of Heron of 
Alexandria. 


This idea was used for the barrel organ and music box, where the general part could 
play any combination of notes in any sequence, and a program was set up to play 
each individual piece. 


The barrel organ influenced weaving, and there were a number of programmable 
looms made that were controlled by a similar idea and mechanism — the most well 
done, and famous, was the Jacquard Loom ca 1800. 


Babbage and Ada Lovelace recognized that the separate program idea — and the 
particular mechanism of the Jacquard Loom — could be used to program a general 
calculating machine he called the Analytical Engine. The idea that programs could be 
written to do more than arithmetical calculations appeared — for example to write 
music — and “the more” included being able to create programs themselves. 


Many stories and years later, Turing illustrated a theoretical mechanism that was 
“universally general”, in that the general mechanism could follow a program ona 
mostly blank memory that described the operations of machines as general as itself. 
An important addition was to allow the machine to write symbols on the memory as 
well as to be able to read them. 


This meant that any such general machine could simulate and carry out the operations 
of any machine. 


The particular way this was organized — as a relatively small simple fixed mechanism 
— and an arbitrarily large general memory that could hold marks representing 
symbols that included both programs and plenty of further space for markings made 
by the computer — wound up being the plan for the first physical programmable 
computers starting in the late 1940s. 


To answer your question: a “computer” doesn't need to be programmable to be 
useful, but “programmable computers” greatly expand the ease of making computing 
mechanisms by providing (a) a fixed part that can read and follow instructions, and (b) 
sets of instructions — programs — that create desired behaviors when followed. 


€ Alan Kay 
\ 1 Still trying to learn how to think better - 2y 


Is computer science a real science like physics or is it a Cargo Cult Science 
with fields like machine learning having no deep theory behind it ? Can you 
explain the reason why computer science is a science? 


Originally Answered: Is computer science a real science like physics or is it a Cargo Cult Science with 
fields like machine learning having no deep theory behind it ? Can you explain the reason why 
computer science is a science ? 


| think we have to get to more precision of terms to approach this question. For 


example, we need to agree at least on what we mean by “Science” and what is meant 
today by “Computer Science”. 


However we define “Science”, we need to include fields such as “cultural 
anthropology” and “suspension bridges”. 


I take “science” to mean the principled attempts to gather phenomena, to understand 
it, how it came about, and to communicate those understandings in terms of -models- 
that allow testing and criticism by matching them with the sources of phenomena. 
One of the many properties that makes science so interesting and important — 
besides its primary approaches and methods — is that the actual “knowledge” in 
science is in the form of the “negotiations” between the models and the phenomena 
(we can think of this as “models are what our brain/minds can represent and handle, 
and phenomena don't care about us at all” — it is the relationships we can make 
between these that is the epistemological breakthrough of science). 


For “suspension bridges”, we can see that these exude phenomena that can be 
studied, modeled and tested, so there is no barrier for "Bridge Sciences” that stem 
from the bridges made by humans. * 


“Cultural anthropology” gives rise to considerable phenomena, but relatively diffuse 
models. This calls for deeply careful and serious “negotiations” — “the softer the field, 
the tougher you have to be”. Since science itself took hundreds of thousands of years 
to invent — and to get "deeply careful and serious” even about Physics — we can see 
that it is possible to “get tough” via will, and thus to get tougher still as we explore 
more complex systems. 


Now we have to deal with “computer science” as a term. It used to mean an aspiration 
to be a true “science of the artificial” * The term today most often is just an idiom for 
practices that mostly should be grouped under “engineering”, (however rudimentary) 
and for which most in the field “don’t care what these terms mean” (sounds like quite 
a few computer programs I've seen!) 


But | was around when the “computer science” and “software engineering” terms were 
coined, and have always taken them — as did the folks back then — as serious 
aspirations: pursuits to be grown into real fields without diluting either “science” or 
“engineering”. 


Here's a good example of “real computer science” from the late 50s. 


John McCarthy wanted to make an “Advice Taker", a “real Al” interface agent that 
could deal with humans in commonsense terms and reasoning, and which could both 
take and give “advice”. He wanted/needed a programming language that would be 
suitable for programming a “common-sense reasoning system”. 


Looking at computers and computer languages of the day (FORTRAN had just 
happened, and there was also an odd imperative list structured language around), 
John wound up making a *model of computing* that was represented in a kind of 
mathematical notation, and was a kind of math, and which used itself as a means of 
definition. You could use it to generate results from different parts of the theory. 


This definition was not too far from something that could actually be implemented on 
a computer of the day — and this was done. This created a new artifact — a kind of 
bridge that could be walked across and tested — and what could be done with it gave 
rise to phenomena that was enough different from both FORTRAN and its first version, 
to prompt further modeling, to create stronger theories. 


We have two important things to contemplate here. The first is a computer example of 
a "science of the artificial” and what a model in this science could look like. 


The second is a wonderful thing about the last few centuries: that the combination of 
science and engineering in the sciences of the artificial creates a delightful gorgeous 
pathway to both great art and great enlightenment. Things are made, they give rise to 
ideas about them, the ideas give rise to advances in the things that are made, etc. 


However, because “anything can have a ‘theory’ put underneath it” we have to judge 
any science — any set of beliefs, knowledge, theories, etc. — by the thresholds that 
are crossed — or not — of the models/theories, etc. 


| judge what McCarthy did to be really good. 


If the area in question is about “intelligence”, | think we have to take this term very 
seriously in order not to just start BSing. 


For example, we can have a lot of theory about "back-propagation” and think it good 
if the topic is “certain kinds of correlations” but, if the topic is “intelligence” in any 
reasonable sense of the term, we can’t think the current most popular theories have 
much content at all. 


Strictly speaking, a “Cargo Cult" is a belief system in which the believers think 
something good will happen if you can just find the right rituals to perform. This 
behavioral pattern is very human, and can be found everywhere, including the most 
so-called “advanced” cultures. 


| think | can see these behaviors in many places in computing. Can you? 


* cf Herb Simon “The Sciences Of The Artificial” 


€ Alan Kay 
{Still trying to learn how to think better - Updated 2y 


Is there anyone out there capable of writing a guitar tab from audio alone? 
| have a riff that | really want to play, but | only have 1 video recording of it 
and think tab would help. 


There is a profession called “take downing” with pros called “take-downers” who listen 
to music and write out the musical scores of what happened, and what is needed to 
play it. Most musicians can do this to one extent or another — the pros can do it really 
well and really quickly. The fees are generally quite reasonable. 


Any good guitarist should be able to take the musical score for a guitar piece and 
invent one or two TAB versions that will help to play it. This is a bit more complicated 
because where things are played on a guitar often depend partly on the guitarist 
doing the playing. Sometimes there is just one way that might not be at all obvious — 
this can take some work to find. 


When | was playing guitar, | much preferred “piano notation” (concert pitch two staff 
notation) to guitar notation (one staff written an octave higher — essentially as a 
tenor part), and both of these to TAB (which is the least musical and useful of the 
notations). | think most guitarists who play “developed music” (such as jazz and 
classical) and who often use scores, will use the standard single staff guitar notation, 
and annotate some of the critical positions and fingerings for one or both hands. This 
works better than TAB IMO. 


These days my too old left thumb won't let me play guitar, and I've found the lute to 
be a really wonderful instrument: much more than a substitute — a cornucopia of 
expression. Most lute music was written in one form of TAB or another and — as 
above — this is occasionally useful, but most of the time is a pain, even when you get 
fluent in being able to read it. When learning a piece, | will usually work out the 
standard notation (fits well on piano staff) for most pieces in order to better grapple 
with the musicality of the piece. Many modern lute book editions include a parallel 
piano notation of the old pieces for just this reason. 


Just for fun, here is an example of a lute piece from more than 400 years ago by John 
Dowland. He wrote the bottom TAB part, which has also been transcribed (in modern 
times) to a concert pitch piano score version placed on top. 


The TAB part is in “French notation” (one of many in use at the time). 


The six lines are the six “courses” (the lutes this was written for have mostly doubled 
strings for each “course”) and are organized as they are in today’s guitar tabs (so the 
highest pitched string is at the top). 


The tuning of the courses from the bottom to the top is almost like a guitar at the 
third fret (G C F A D G) — note that the major 3rd is one string lower. The frets are 
indicated with “a = the open string” (the “a” looks a bit like a short “d"), “b = fret 1", 
“c" = fret 2 (looks a bit like an “r". The “d" looks a bit like a “d" tilted to the left, etc. 


The durations of the “note slots” are given by the flags over the staff. A blank duration 
uses the previous one (note that this does not generally indicate held notes, so you 
have to deduce as well as can be). 


The bar lines have a slightly different meaning (a whole measure is a kind of a single 
pulse — called a “tactus"). This invites more freedom within each measure (note the 
larger size of the measures in the TAB). 


€ Alan Kay 
© Still trying to learn how to think better - Updated 2y 


Should web browsers have stuck to being document viewers? 


Actually quite the opposite, if “document” means an imitation of old static text 
media (and later including pictures, and audio and video recordings). 


It was being willing to settle for an overly simple text format and formatting scheme 
— "for convenience” — that started the web media architecture off in entirely the 
wrong direction (including the too simple reference scheme c.f. Doug Engelbart and 
Ted Nelson). Circa early 90s, it had the look and feel of an atavistic hack. | expected 
that Netscape would fix this rather than just try to dominate what was there (I 
expected a better architecture both for “thinking about media in the age of 
computing” and also something not like “an app” but more like an operating system 
to deal with the actual systems requirements, demands, and scalings of the world- 
wide Internet-in-progress). 


It's both surprisingly and dismayingly difficult to get people — especially computerists 
— to criticize the web and the web browsers — even more so perhaps today. 


This is despite the glaring fact that the interactive media provided by the web and 
browsers has always been a hefty and qualitative subset of the media on the very 
personal computers that run the web browsers. 


At the time of the WWW’'s inception — in the early 90s — | made several 
recommendations — especially to Apple where | and my research group had been for 
a number of years — and generally to the field. These were partially based on the 
scope and scalings that the Internet was starting to expand into. 


1. Apple's Hypercard was a terrific and highly successful end-user authoring 
system whose media was scripted, WYSIWYG, and “symmetric” (in the sense 
that the “reader” could turn around and “author” in the same high-level 
terms and forms). It should be the start of — and the guide for — the “User 
Experience” of encountering and dealing with web content. 


2. The underlying system for a browser should not be that of an “app” but of an 
Operating System whose job would be to protectively and safely run 
encapsulated systems (i.e. "real objects”) gotten from the web. It should be 
the way that web content could be open-ended, and not tied to functional 
subsets in the browser. 


| pointed out that — as with the Macintosh itself — these two recommendations — 
which seem to be somewhat at odds — have to be reconciled. The first 
recommendation would be the next stage in the excellent Macintosh “guidelines” 
about its user experience (Chris Espinosa and others have never been praised highly 
enough for this important work). These guidelines laid out the conventions to be 
followed for any app of any functionality — they are the parts that must be similar. 


The second recommendation was to reinforce the idea that the content to be run 
within the system had to be as free from the tools of the OS as absolutely possible 
(because special needs often require special designs etc). An example, was that the 
content needed to be able to generate its own graphics if necessary (even if the OS 
supplied some graphics tools). The more the content wanted to go its own way, the 
more its presentation to the users had to be made to conform to the standards in (1). 
As with any decent OS, it has to allow for new ideas while also providing the resources 
for safety, efficiency, and to manifest user experiences. 


If we squint at some of the implications of both of these, we can find a number of 
good principles from the past. One of them — as a real principle — | trace to the first 
Unix systems at Bell Labs. The design was partly a reaction against the extremely 
complex organization of the Multics OS at MIT. One of the great realizations of the 
early Unix was that the *kernel* of an OS — and essentially the only part that should 
be in “supervisor mode” — would only manage time (quanta for interleaved 
computations) and space (memory allocation and levels) and encapsulation 
(processes) — everything else should be expressible in the general vanilla processes of 
the system. More functionality could be supplied by the resources that came along 
with the OS, but these should easily be replaceable by developer processes when 
desired. 


The original idea was to instigate as much progress as possible without incurring lock- 
in to a huge OS, but to protect what needed to be protected and ensure a threshold 
of system integrity and reliability. 


Sidebar: perhaps the best early structuring and next stage design of Unix was Locus 
by Gerry Popek and his researchers at UCLA in the early 80s. Locus allowed live Unix 
processes to migrate not just from one machine to another on a network, but to a 
variety of machine types. This was done by combining the safety required for 
interrupts with multiple code hooks in each process, so an “interrupt” could allow 
the process to be moved to a different machine and resumed with different 
(equivalent) code. It was easy to see that combining this with an end-user language 
would provide a network-wide system that would run compatibly over the entire 
Internet. Soon after arriving at Apple ca 1984, | tried to get them to buy Locus, but 
the “powers that be” at the time couldn't see it. 


Note that when such a system is made interactive — e.g. using the sweeping ideas 
from the ARPA/Parc research community — the end-users need to have a user 


interface framework that is generically similar as much as possible over all applications 
— and that this can conflict with the freedoms needed for new ideas and often new 
functionalities. 


So this is an important, large, and difficult design problem. 


My complaints about the web and the web browsers have been about how poorly 
they were thought about and implemented, and how weak are both the functionalities 
of web content and the means for going forward and fixing as many of the most 
critical mistakes as possible. 


One way to look at where things are today is that the circumstances of the Internet 
forced the web browsers to be more and more like operating systems, but without the 
design and the look-aheads that are needed. 


1. There is now a huge range of conventions both internally and externally, and 
some of them require and do use a dynamic language. However, neither the 
architecture of this nor the form of the language, or the forms of how one 
gets to the language, etc. are remotely organized for the end-users. The 
thresholds are ridiculous when compared to both the needs and the 
possibilities. 


2. There is now something like a terribly designed OS that is the organizer and 
provider of “features” for the non-encapsulated web content. This is a 
disaster of lock-in, and with actually very little bang for the buck. 


This was all done after — sometimes considerably after — much better conceptions of 
what the web experience and powers should be like. It looks like “a hack that grew", in 
part because most users and developers were happy with what it did do, and had no 
idea of what else it *should do* (and especially the larger destinies of computer media 
on world-wide networks). 


To try to answer the question, let me use “Licklider’s Vision” from the early 60s: “the 
destiny of computing is to become interactive intellectual amplifiers for all humanity 
pervasively networked worldwide”. 


This doesn't work if you only try to imitate old media, and especially the difficult to 
compose and edit properties of old media. You have to include *all media* that 
computers can give rise to, and you have to do it in a form that allows both “reading” 
and “writing” and the “equivalent of literature” for all users. 


Examples of how to do some of this existed before the web and the web browser, so 
what has happened is that a critically weak subset has managed to dominate the 
imaginations of most people — including computer people — to the point that what 
is possible and what is needed has for all intents and purposes disappeared. 


Footnote about “Ever expanding requirements at Parc” (prompted by Phillip 
Remaker's comment and question) 


When Gary Starkweather invented and got the first laser printer going very quickly, 
and at astounding speeds (a page per second, 500 pixels per inch), there was a push 
to get one of these on the networked Altos (for which the Ethernet had been 
invented). The idea was to use an Alto as a server that could set up and run a laser 
printer to rapidly print high quality documents. 


Several of the best graphics people at Parc created an excellent “printing standard” 
for how a document was to be sent to the printer. This data structure was parsed at 
the printer side and followed to set up printing. 


But just a few weeks after this, more document requirements surfaced and with 
them additional printing requirements. 


This led to a “sad realization” that sending a data structure to a server is a terrible 
idea if the degrees of freedom needed on the sending side are large. 


And eventually, this led to a “happy realization", that sending a program to a server 
is a very good idea if the degrees of freedom needed on the sending side are large. 


John Warnock and Martin Newell were experimenting with a simple flexible 
language that could express arbitrary resolution independent images — called 
"JAM" (for “John And Martin” — and it was realized that sending JAM programs — 
i.e. "real objects” to the printer was a much better idea than sending a data 
structure. 


This is because a universal interpreter can both be quite small and also can have 
more degrees of freedom than any data structure (that is not a program). The 
program has to be run in a protected address space in the printer computer, but it 
can be granted access to a bit-buffer, and whatever it does to it can then be printed 
out “blindly”. 


This provides a much better match up between a desktop publishing system (which 
will want to print on any of the printers available, and shouldn't have to know about 


their resolutions and other properties), and a printer (which shouldn't have to know 
anything about the app that made the document). 


"JAM" eventually became Postscript (but that's another story). 


Key Point: “sending a program, not a data structure” is a very big idea (and 
also scales really well if some thought is put into just how the program is set 
up). 


é Alan Kay 

‘1 Still trying to learn how to think better - 2y 

What old computers were way ahead of their time? 

If the early 80s is “old”, then what are the early 50s or early 60s? 


“ca. 1950” Let me submit "Whirlwind", one of the earliest parallel architectures, and 
which was designed to be able to do real-time interactions, especially using graphical 
displays and pointing devices. It looks architecturally like a 16-bit mini-computer — 
the DEC ones in the 60s came from it — but required many racks of equipment, etc. 
The key technology that got commercial computing going — core memory — was 
invented on/for this machine. The SAGE air defense system sprang from this — and 
the Air Traffic Control system used today came from SAGE. Ivan Sutherland's 
Sketchpad was done on an experimental SAGE computer that came from Whirlwind, 
and was designed by one of the designers of Whirlwind. 


“ca. 1960” The Burroughs B5000, the first computer architecture in which almost 
every feature was designed to run higher level languages, and safely. Today's “byte- 
coded” virtual machines are software versions of part of the B5000’'s hardware. It had 
multiple processors, virtual memory, automatic process switching, capability 
protection, and much more. It was basically a kind of “object-oriented machine”. | 
once remarked that “7 of the 10 best software ideas first appeared in the hardware 
design of the B5000" (this is a metaphor ... but has the deeper truth that metaphors 
can sometimes express best). 


Whirlwind’s designs established hardware patterns after it, but the B5000 ran up 
against the inability of most HW (and most SW) people of most eras not 
understanding anything important about software. We used many of the B5000 ideas 
in the Xerox Parc microcoded machines, both in "HW" and "SW", and they were one of 
the major reasons for the Parc successes. 


€ Alan Kay 

AT I reada lot - 2y 

What books would you recommend for an autodidact who is interested in 
the field of human-computer interaction? 

All learning — in the end — is autodidactic (*we* have to do the work to learn: no one 
can give us a transfusion). The main thing that is required for any learning is a sense of 
threshold — *where* you have to get beyond in process, knowledge, skill, etc. before 
you are actually fluent. Next you have to have a sense of sweep and range and depth, 
to attain a larger sense of the field. This is why most autodidacts who get above all 
these thresholds in a subject also have gone to school of some kind (this can help a 
lot, especially if it is not thought of as the center source of all knowledge). 


For UI | think there are likely to be more than 100 books that need to be absorbed, 
and | apologize in advance for not sitting down and trying to put together a list. 


Basically, you need to learn about human beings from a number of angles, most of 
them psychological and anthropological. And this will include wide excursions to 
books such as “Art and Illusion” by Gombrich, “Visual Thinking” by Arnheim, “The 
Magic Mirror” (about theater) by Nathan, and much more. 


“Portrayals and Visualizations” e.g Ed Tufte’s books, and Bret Victor's ideas and works. 


You need to understand the nature of human communication, and also what it means 
for a human to get fluent in a new medium (so understanding what McLuhan was 
driving at is critical — and also some of his predecessors and successors: for example, 
nnis, Havelock, Ong, Postman, etc.) 


Quite a bit of Jerome Bruner will help, as will Hadamard’s “The Psychology of 
nvention in the Mathematical Field”. These are about “multiple ways of knowing and 
thinking”. Kahneman’s “Thinking: Fast and Slow" and some of the “Behavioral 
Economics" work that has come from this need to be included. 


got a lot — in general — from reading the turgid but important Korzybski, especially 
“Science and Sanity”. And Koestler’s “Act of Creation” (in contrast, a smoothly written 
book that also requires real concentration). 


Learning about “intentions” for Uls is very important. For example, Engelbart’s original 
1962 proposal, the Joss “Designer's View” paper by Shaw, Teitelman’s MS and PhD 
theses at MIT, Bret Victor again, (and many more). 


Learning about classical great UI designs is a must: from JOSS, to Sketchpad, to GRAIL, 


to Teitel man's DWIM, to the work at Parc, to Hypercard, etc. I've given some 
presentations about how the current pervasive “Parc GUI" got invented, including 
many of its precursors, etc. Many of these are available on YouTube. 


Brenda Laurel's book “The Art of Human User Interface”, and the book “Watch What | 
Do" by Allen Cypher are musts. 


“Real Al” (cognitive Al) will be key in UI designs in the future along many important 
dimensions. 


And much much more. 


You asked for books, but — as with the theater (which has much to do with Ul and 
vice versa) — it is not possible to learn this without also actual experience in designing 
and making Uls. It’s the combination of learning how to do hundreds of experiments, 
plus what has gone before, plus the few important principles that are known, that 
creates fluency in this subject. 


Above all, you have to develop not just a sense of the general user but a great desire 
— a sympathy — to help them. The first rule in UI design is “The End Users Are Not 
Like Us!" For example, they don't do UI design, and most of them are not computer 
folks, etc. so introspection is one of the worst approaches. (This is also true of theatrics 
in general.) | have found that including children as users for everything | do has 
helped greatly — | have more sympathy for children on the one hand, and children 
are generally a bit more driven to learn than adults — both of these help. 


One of the biggest breakthroughs in UI design as it was getting invented was to move 
from trying to supply tools (and the equivalents of control panels for nuclear reactors 
for experts), to trying to make “learning environments” for the general public. It's 
ultimately like teaching the general public how to read while allowing them to exist 
and function in society. What are all the dimensions of these two very different kinds 
of requirements? 


This POV has greatly diminished in recent years, and the generally poor Uls around 
now reflect this (e.g. where are UNDO and HELP on the iPhone and iPad, etc ?......) 


é Alan Kay 
‘1 I'm the “Alan Kay" in question (try Google for the usual misinformation) - 2y 


How old is cloud computing? 


There are always contenders for “first” — which often cannot be reasonably sorted out 
— so | like to use “early” instead. 


| think of “cloud” as a name eventually given to what was earlier called an “information 
utility” (as an analogy to the electric power, water, gas, telephone etc utilities that 
already existed). And | think of this is “information resources” provided from the 
outside that can be used nearby in a number of useful ways. 


The idea originated in the 50s, was heavily influenced by the SAGE Air Defense System 
(which time-shared huge mainframes devoted to tracking planes, and whose facilities 
were networked together), with eventually two concrete independent proposals 
around ca 1959 (John McCarthy and Christopher Strachey). As far as | can tell an 
“extremely early” working embodiment of the idea in a usefully general form was the 
Compatible Time Sharing System at MIT ca 1961. 


Many other systems followed. 


In 1962 JCR Licklider set up the ARPA “Information Processing Techniques Office” 
(IPTO) and the “ARPA Dream” was to boost human thinking via interactive computing 
pervasively networked over the world. Lick got quite a bit of funding from ARPA, and 
early grants included to MIT to really make CTSS practical (the ARPA project there was 
called “Project MAC”, which had several readings, one of which was "Machine Aided 
Cognition’). 


IPTO expanded to more than 15 projects devoted to various aspects of the dream, and 
eventually the ARPAnet ca 1969 started to test larger scale networking with the 
possibility of multiple computers as part of the “information utility”. 


The story of ARPA-IPTO (which includes the later creation of Parc as part of it), is well 
told in the book “The Dream Machine", by Mitchell Waldrop. 


€A Alan Kay 
Still trying to learn how to think better - 2y 


What is the history behind Alan Kay's quote, “Point of view is worth 80 IQ 
points.”? 

We humans are set up by our nature — and the cultures to which our nature gives rise 
— to deal with much of our lives and memories in stories and story-like forms. 
Proverbs are very short stories, and as such they are much more memorable than 
careful descriptive and expositional paragraphs. 


A proverb at its best is actually a commercial for quite a bit more pondering. Many of 


them — including this one — can include themselves as subjects. 


And — as with many commercials in the pop culture — and with proverbs throughout 
history — most are instead used directly as short stories, and most especially to 
rationalize an opinion or event. 


For example, compare "Where there’s smoke there's fire” with “You can't tell a book by 
its cover". Most proverbial cultures have a proverb and story for every side of every 
event because these are used not for “truth” but for *resonance’, to be dipped into for 
comfort and a kind of explanation that is the way people want to have something 
explained. This is how most religious tracts are used. 


So they are a double-edged sword at best. 


After not understanding Marshall McLuhan ca 1964, | was motivated to spend much of 
the summer of 1967 to take another deeper pass (a professor | wanted to talk to 
didn't like to talk to grad students much, but he was a reader and was known to 
mention McLuhan ... ). When the light finally dawned, | realized that McLuhan was one 
of avery few who had “made something that was invisible more visible”. 


As with many of these momentous uncoverings, the “invisible” was slightly visible but 
deemed so unimportant that if was effectively not there. This was the idea — and fact 
— that most of what we take to be "plain reality” are actually just beliefs, and most of 
these are so taken for granted that we are almost never aware of them. We use them 

in our reasoning and decisions but think our logic is absolute rather than relative to 


the — "context", “perspective”, “point of view", “world-view”, “paradigm”, etc. — in 
which we are operating. 


McLuhan used this as part of his assertion that the learning of a 
communications/representation system must require the brain/mind to change (this is 
what learning means), and that the most important changes are the ones that are 
absorbed as “context” and rendered effectively invisible. He was initially interested in 
the *qualitative* differences in thought brought by writing and then by the printing 
press. And then could see that television was likely to have an enormous impact on 
humanity (and without most people being aware of the changes that had to happen). 


He was not a scientist, but once so prodded, any scientist and mathematician can see 
immediately that both maths and science were not just about *more* of something, 
but about vast changes in how things are looked at and thought about. *Methods* 
themselves could carry new points of view on their shoulders. This was already a topic 
of pondering — Kuhn etc — but without the pithy secret sauce of great aphorisms. 


Arthur Koestler had recently written a terrific book about creativity — "The Act Of 
Creation” — and that used the explicit idea that most creativity was being able to see 
an idea, not just as a weak form in an initial context, but as a strong idea in a strong 
context. He used the imagery of “planes of thought” at different angles. 


The ARPA research community in which | was luckily and happily embedded was 
coming up with new perspectives, POVs, contexts, etc. for computing. The McLuhan 
expositions made these much easier to see and think about as part of what we should 
be doing as researchers. 


Part of the ARPA research context was drawn from “the implications behind Turing” as 
to “what computers actually *are*". Another important part was drawn from the Cold 
War (and previous hot wars) as well as from other disasters in the making, such as 
planet-wide problems with food, water, the climate, etc. 


If you try to make the invisible contexts visible, then it is easy to see that some of 
them are very positive with respect to advancing “civilization”, and others were terribly 
retrograde. What people “believe is ‘reality’ " is the most important to put light on, 
especially if their “reality” doesn’t include the idea that “their reality is mostly just 
beliefs with no stronger foundations”. 


These thoughts persisted in this form until Parc, when | found myself having to explain 
“stuff” to Xerox executives. One of the earliest ones popped out. | realized somewhat 
grudgingly that they weren't "paragraph types”, and tried to find “punch lines” for the 
talks and pop papers | started to have to come up with. 


It's worth noting here that we need to use “change of context is worth 80 IQ points” 
on itself. For example: "IQ points” is a metaphor, and is not about what is officially 
considered "IQ". And there is no sign: no “plus” or “minus”, even though a moment's 
thought on contexts/POVs/etc will show that people can often have disastrously weak 
contexts for some of their thinking. (Witness history, the last few years, the last few 
weeks, yesterday, etc.) 


Similarly, the earlier “The best way to predict the future is to invent it” doesn't say 
anything about what kind of future. People in power often invent terrible futures and 
make them happen (again, just take a look at unnecessary disasters over history, 
especially recently). 


F = G* M *m/d? works ina context with a flat geometry and instant propagation 
(Newton was quite sure the latter was not the case in our universe). It definitely 
worked well, and was soon taken to be absolutely true and was *believed as such*. 


This process is very similar psychologically to Joseph Campbell's observation that 
religions were generally founded by social geniuses who tried to find ways — many of 
them metaphorical — to talk about ideas outside of the general commonsense. Many 
of their less sophisticated followers *reified* the abstractions into what they thought 
were *actualities*, and believed them rather than thinking about them. 


But small parts of science are much more careful about beliefs than the general public 
(and even within science). Eventually enough careful work revealed tiny discrepancies 
in Newton's suggested relationship that required a very different context to give 
background to another compact (and more accurate) explanation. 


A great metaphor is not just a commercial for *more* thinking, but also can be a door 
into *stronger contexts* for thinking. And, it can also be used as an even more opaque 
blinder (especially if the proverb above is not used to help!). 


€ Alan Kay 

“7 Have designed a few programming languages - Updated 2y 

What was it like programming an IBM 1401 back in the 60's? 

The IBM 1401 was the first computer | programmed as part of a real job (an enlisted 


man in the US Air Force at Air Training Command (ATG), Randolph AFB, San Antonio, 
Texas). 


In the early 60s, the USAF had a quite difficult aptitude test made for them by IBM to 
screen potential programmers for any of their computers. | took it for fun in early 
1962, passed it, and was assigned to Randolph AFB, and IBM training school for the 
1401. The training was done in one intense week, wall to wall. 


The range of HW architectures was much wider back then, but the 1401 was still 
thought to be “odd”, and today would be considered very unusual. It had been 
developed to gradually subsume “punched card accounting machine” (PCAM) tasks 
(for which IBM was the world’s largest company). As with many businesses back then, 
Air Training Command had a number of “shops” of such machines, with perhaps 50 to 
100 machines each laid out over 0.5 to an acre. 


— | can't believe that Google won't yield a picture of one of these large “shops” — 
perhaps someone can find one — 


A very small PCAM “shop” — imagine with 50+ machines in an enormous room 


The 1401 computer was a “character machine”, with a memory of 5K to 16K characters 
of 6 info bits plus a “word mark" bit each — these were set to indicate ends of “words” 
(a sequence of characters in memory addressed by the low order character. A “data 
character” was (most generally) an encoding of a 12-row punched card column: 4 bits 
to encode 0-9 and the other two bits to encode the “zone row bits”. Note that this 
just gives you 40 combinations, but other tricks were used ... 


To add two numbers, the 1401 essentially did elementary school arithmetic, 
progressing character by character until one or both word marks were reached. (This 
meant that the 1401 could be set up to add one half of memory to the other half.) 


The 1401's memory cycle for one character was 11.5 microseconds ... so by computer 
standards of today it was both tiny and a snail. By PCAM standards, and tape 
standards, it was quite fast enough. 


It was essentially a two address machine, but the address registers persisted, so you 
could deal with “fields” on “cards” sequentially by e.g. doing one two address add — 
an opcode and two three digit addresses (which would be terminated by the word 
marks), and the next add would only require the opcode because it would use where 
the address registers wound up from the previous operation. (Instructions were 
processed in the left to right direction with word marks at the opcodes.) 


The 1401 was thus essentially a variable sized word and variable sized instruction 
machine. Getting lots of stuff done was greatly rewarded by careful planning. 


IBM 1401 - Wikipedia % A nicely detailed article. 


The IBM Reference Manual we used ( (the version here is accidentally from early 
1962, just about exactly when | started programming the 1401). 


The punched cards to be worked on were read by a card reader, converted to tapes, 
processed, the results went back to tapes, were printed, and/or punched back into 
card form. 


IBM eventually made and sold well over 10,000 of these machines (the first mass 
produced computer in history). 


The IBM 1401 “mainframe”. 


1403 Printer 


1402 Card reader/punch (good for putting program listings on top of) 


IBM 729 Tape Drives — with vacuum tape loop buffering 


The 1401 we worked on at ATC had 8K bytes of memory and six very fast tape drives 
for those days. There was no OS, and most debugging was “desk checking”. You could 
get “3 minutes” once a day to try a program, but you couldn't touch the machine — 
you put your listing on top of the card reader and asked the operator to do this and 
that (making friends with the machine operators was critical). 


The basic learning process was to do the one week hardware school, after which you 
were quite able to program anything on the 1401 (this was one of the fruits of the 
difficult aptitude test — it probably overscreened, but everyone who did pass it had 
absolutely no trouble learning a whole tricky machine and its machine code 
successfully in a week). 


Then you spent some months “programming” — which was called “coding”. What was 
called “programming” back then was design and flowcharting the design. The PCAM 
machines were all flowcharted, and these were dataflow schemes. These had to be 
serialized a bit to be converted to computer processing. 


A “coder" was thus a human “compiler” — whose job it was to convert the higher level 
programming language of flowcharts to working machine code. 


After a few months of this “on the job training” and assessment one was allowed to do 
some design, programming, and then coding for new tasks. There were already 
seasoned (a year or so) programmers — also mostly enlisted men — who were 
generally very helpful. 


There was another larger computer there — the Burroughs 220 — and learning it and 
doing similar kind of work on it improved the programming and coding for both 
machines. 


A few other things | should mention. 


It was possible to be extremely clever with the 1401, and it should be obvious that the 
wordmark scheme allowed clever memory allocation schemes, overlays, etc. 


(It is well worth your time to take a look at Val Schorre’s 1964 “Meta II" system at 
UCLA, @ a compiler-compiler that would run in an 8K 1401.1 wish | had known about 
this when | was a 1401er, but its invention happened after | went back to school.) 


We were able to make a batch operating system that would run all the shop jobs, and 
which fit into the top few hundred characters of memory (and using tape to help). 


IBM had an extremely capable “tailored macro” assembler — Autocoder — that 
allowed a wide range of expression and “conditional assembly” to produce highly 
optimized and compact code. 


IBM already had a policy to “make everything run on everything”, and this meant that 
both the relatively new FORTRAN, and the even newer COBOL had to be compilable 
and runnable, even on the smaller configurations. The latter never made it into the 
ATC shop while | was there, but there was a FORTRAN “for curiosity’s sake” (this was 
because the compiler took about 100 passes (or “pulls” of the tape drive) i.e. a very 
long time). 


Reports Program Generator (RPG) was used for a few jobs (I tried it a few times along 
with others). It made a few vanilla tasks quicker, but the macro library we had 
developed was generally more convenient, and only a little more dangerous. 


The older larger slower vacuum tube Burroughs 220 did have something really 
interesting: the BALGOL compiler (Algol 58 with a few Burroughs touches). It had one 
of the most beautiful manuals, and this got me to read it. One of the officers had been 
a CalTech grad and had used it there. At ATC, it was also a “curiosity” but as a much 
“sweeter and cleaner” example of a higher level language. 


The next machine in line for ATC was to be a Burroughs B5000, a truly amazing 
machine whose hardware was made directly for an Algolic language (Algol 58 really), 
and all of whose software — including the OS (it had one) — was written in higher 
level form. | learned this machine from the documentation, only understood about 
half of it, and was back to college before it showed up. (A few years later — by 
accident — one of my grad school profs was Bob Barton, the main inventor of this 
marvelous design.) 


| was able to get back in school for the fall of 1963, and the AF let people out up to 
100 days early to not miss a term. | was able to get a good part time job programming 
for the National Center for Atmospheric Research (NCAR) in Boulder, that would pay 
my tuition and room and board for the rest of my undergrad years at the U of 
Colorado. 


€ Alan Kay 

Have designed a few programming languages : 2y 

If one starts with a state of the art object-oriented programming from the 
late 80s and early 90s (Smalltalk, Self, CLOS), what could have been the next 
breakthrough in that style of programming, but never happened? 

For September 1984, | was invited to write the cover article for the Scientific American 
special issue on “Computer Software” — back in the heyday when SciAm was a much 
more serious attempt to bridge the gaps between science and the general public. 
Within the bounds of this audience, | took a look at some present and future 
possibilities 7. 


My friend and colleague Doug Lenat was invited to write the article on “Computer 
Software for Intelligent Systems”, and my friend and colleague Larry Tesler was invited 
to write the article on "Computer Software and Programming Languages”. 


It was extremely tempting to try to convince Scientific American to go to the next 
levels of explanation to include a much larger look at the possible futures for software 
and programming. This didn't pan out. 


l included a category in my article — “Ultra High Level Languages” — which 
suggested that Lenat's “Eurisko” system was an “almost a new thing” that could bridge 
to “really new things”. 


The tongue in cheek predictive scheme | had come up with for the article predicted 
the past well, but the next “new thing” didn't show up. Later | thought this was 
because the extreme commercialization of personal computing in the 80s and the 
Internet in the 90s had brought forth a large new ad hoc group of people who wanted 
in on computing. The weak commercial HW, the new crowd, the “from zero again” 
contexts, the lack of ARPA-like funding, etc. all conspired to knock expected advances 
into oblivion. 


The research perspective in the 60s and 70s was about qualitative improvements in 
programming itself, not in extending a particular style (as the current question 
requests). | still think this is what the discussion should be about, because the real 
problems are about how to program computers more readily, not about what is 
currently in place and/or popular. 


We have to start with the problems of intent: determining, capturing, debugging, 
eventually optimizing, etc. We can think of intent as an assertive manifestation of 
meaning. Whenever | thought about this back in the 60s, 70s, 80s, etc. | would look to 
see what the “real Al” people were doing (I have to use this term because “Al” has 
been “colonized” today by what was a small subset of the larger idea back then). 


Our thinking about this can be helped when we note that human beings have a very 
difficult time being crisp about what is sought for, and especially to communicate this 
to others if they think they have it down. The established fields within Engineering 
have made great strides over the last 40-50 years by inventing and using CAD tools to 
incrementally capture and visualize the desires, coupled with SIM tools to vet the 
designs in real-world situations. There has also been great progress in the third 
member of the triumvirate: FABrication, and the pragmatic optimizations needed to 
go beyond what simulations can do. 


A supreme irony is that these powerful tools in pretty much all forms of real 
engineering (there are even CAD<->SIM->FAB tools for bioengineering") are done 
on computers, and most of what they can do can only be done on computers. 


But when we look to how “computing” comports itself today, it is hard to see anything 
remotely at the level of the serious engineering disciplines. In the terms above, a vast 
percentage of software seems to be done as stabs at FAB only. Some of the 
exceptions are interesting. For example, at the large scale end of industrial computing, 
I've seen documents of “requirements” in the form of lists, but most with no diagrams 
at all, even though large parts of the requirements are for Ul. There is no attempt to 
vet the requirements, and no sense that they even make sense in the large. 


And so forth. The breakthrough that is needed in programming has to come from a 
larger sense of what it is supposed to be about i.e. it is about the whole system that is 
needed to do programming that can be done and trusted in the engineering sense of 
doing and trusting. 


As Doug Engelbart pointed out in 1962, you have to create not just “augmented 
individuals” but “augmented groups of augmented individuals” in order to be serious 
about next steps. The augmentations are not just computers and computer tools, but 
also education, new methods and languages for representation: these have to reflect 
back into the “augments” that can be manifested on the computer parts of this more 
complex organization. 


é Alan Kay 
‘<7 Still trying to learn how to think better - Updated 2y 


What is your review of The Dream Machine: J.C.R. Licklider and the 
Revolution That Made Computing Personal? 


Bottom Line: The best single book about the inventions of personal computing and 
the Internet, as judged by those who were involved. 


It's the book | always recommend when people ask for accounts of how much of 
today’s interactive networked technologies and software got invented. 


During a discussion of this book, one of the old hands said “The mistakes kind of 
cancel out” — meaning that the larger sweep of the story is pretty accurate, and was 
able to capture some of the spirit of the people, the vision, and the work. 


The author — Mitchell Waldrop — did more homework with the actual community 
than any of the others — about 5 years worth — and was able to come up witha 
workable sequence for the story (this is really difficult, and the other attempts were 
more confused). 


The book starts with the large contexts, including WWII, that got the US government 
deeply involved in science, engineering and mass industrialization for many 
technologies, and it properly centers the main story around JCR Licklider, who was a 
special catalyst, carrier of visions, creator, and funder of the large research community 
needed for many different kinds of new inventions in computing and networking. 


| wrote a tribute to this research community in 2004 — The Power Of The Context 
— that is both relatively short, and has a pretty good bibliography for people who are 
interested in a bit more detail on different parts. 


A book to read afterwards, especially for stories, is Michael Hiltzik's “Dealers of 
Lightning”. This book is not as well organized as “The Dream Machine," and its style is 
very much more “The Hero's Journey” (and this misses some of the most important 
points). However, Hiltzik was able to get some very good stories in his interviews. 


All of the attempts at historical accounts missed capturing one of the most important 
elements of this community: its ability to cooperate in so many ways. Part of this was 
due to actual intentions and actions of the funders, and part was due to the particular 
nature of the grad students and interns — then young PhDs — that ARPA created as 
part of its “research results”. A very large number of these from the many ARPA 
projects wound up as friends and colleagues, and being Pls themselves. This was 
powerful for the culminating phases of personal computing and the Internet, where 
much of the main work was done by the “second generation” researchers ARPA 


created. 


€A Alan Kay 


Still trying to learn how to think better - 2y 


Why is Machine Learning so hard to explain? 


Here is an explanation of “conditioned learning” for complete beginners. How would 
one explain Al to a complete beginner? 


€: Alan Kay 

<< lam the Alan Kay in question. - 2y 

What would Alan Kay think about a future U.S. President immediately 
scaling up the circular economy? 


The technical issues are many, longstanding, and systems related — and there are 
many complications from most Americans not understanding them, or the grave 
dangers the world has been in for several decades now. If this were a conventional 
war, the enemies have already been allowed to land and establish beachheads. 


In the “situation room” of a conventional war — as with other highly complex systems 
— there would be many kinds of visualizations of what seems to be going on, and 
from other angles, there would be simulations of the future under many kinds of 
conditions including the many different kinds of things that need to be addressed and 
done. 


The main thing that needs to be avoided when a complex vital system is under stress, 
is to topple it. The basic idea is that, for a non-linear system (most of the interesting 
and important ones are) it generally requires much less energy to topple than to 
restore it. 


The “barely stable” configuration here can handle small stresses, but a bit more will 
topple it, and now orders of magnitude more energy will be require to restore it. 


At this point, 57 years after the first scientific warnings of the climate problem, it is 
essential above all other things to reduce global warming, and most of this will require 
reducing the greenhouse gases in the atmosphere (not just CO2, but methane, etc.) 
Methane is especially a problem because there is a lot locked up in the tundra and in 
the oceans that the current global warming is starting to release. 


Even if there were no politics that have to be dealt with, this is now quite a difficult 
complex problem — not just to do, but especially to understand and minimize the 
collateral impact of the tradeoffs of the transition. In WWII meat and other goods 
were rationed, and it is likely that forms of rationing of energy will eventually be 
required. 


A tiny part of the larger problem — with special focus on tradeoffs, and even benefits 
— is treated in an excellent paper by stellar engineers Saul Griffith and Sam Calisch. 
https://www.rewiringamerica.org/s/Jobs_White_Paper_Compressed_Release.pdf 


An effort of the size needed to deal with the primary crises would have many spinoff 
benefits and new forms of organization. One of these would be something like a 
“circular economy”, but one in which the components are mostly also new rather than 
trying to repatch the current system (which is very far from the aims, the 
organizations, and the sensibilities needed). 


But the first order of business is to head off further global warming as quickly as 
possible, and this will require not just planning to solve the problem as though people 
don't exist, but the larger part of the planning will have to be about how to engineer 
transitions that will balance the overall threats and solutions in ways that will minimize 
harm. 


This will be the most complex challenge of science and systems thinking ever 

attempted. The extremely poor response of people and governments to the relatively 
easy to understand COVID-19 threat, is quite worrisome given that the climate crisis is 
still much more invisible than the current pandemic, despite the decades of warnings. 


The bottle that is the Earth could tip over. 


€ Alan Kay 

- 1 Have designed a few programming languages : 2y 

What is the best way for 2 independent programs to communicate with 
each other? One program reading while other program writes to a file? Or 
opening a local TCP connection and communicate through it? And which is 
commonly used method? 


This looks like a class homework problem. In any case, it’s the kind of question for 
which many useful answers are readily available via Google, StackOverflow, etc. 


€ Alan Kay 


Have designed a few programming languages - 2y 


What is declarative programming? 


One way to think about this is the distinction between “hows” and “whats” (and 
perhaps a middle ground of “almost whats"). 


Historically, programming started in the “how” realm, with the program as the tactics 
for making the desired effects from the materials and tools available. This would be 
equivalent to instructions for building a house step by step, but without blueprints. 


Almost always in the realm of practical construction, there is a “what” in mind, and 
often some work is put into manifesting this before the construction is to begin. 


An example of this would be plans for a house, or even a careful scale model, that 
represent part of a strategy for getting the house made. In both cases the plans are 
“declarative whats” and humans can often interpret them well enough to organize 
tactics to build the actual goal. The tactics/hows will often refer and compare to the 
strategic plan/whats to judge how well things are proceeding. 


In programming, we usually start with at least a vague idea that is mostly “whats”. And 
we have some ideas about how to test the programs so far against these. We usually 
have less process to vet the “whats” themselves, so it is often the case that the 
resulting working program will not be what was actually desired (this occasionally 
shows up in house building as well!). 


A perfect collection of tests — where if and onlly if all the tests succeed then the 
program is what was desired — could possibly be used as a declarative program. In 
practice, the tests are not complete, and there are generally no paths to ensuring the 
adequacy or the completeness of the tests — and most importantly, it is very difficult 
to just turn tests into a running program (for either humans or other programs). 


This could be like a list of requirements for a house, but with no model, or drawings, 
etc. What is needed is a combination of tools and approaches that allow the 
conception of the result to be developed in a way that will also allow an automatic 
program to figure out how to write an efficient program that can find the “hows” from 
the “what”. 


One of the earliest successes was to be able to express numerical mathematics in the 
“whats” of standard mathematical expressions that showed desired relationships but 
didn't show how they could be computed. The “George” system on Whirlwind was one 
of the first, and FORTRAN followed soon after. 


Ivan Sutherland's Sketchpad system (ca 1962-3) was a startling groundbreaking 
system along almost every dimension of hardware and software (and I've written 
about it in other Quora answers). For here, it was one of the very first interactive 
systems to combine design and realization entirely in terms of “whats” — one showed 
the system “what you wanted” — both in terms of actual pictures and in terms of 
“constraints” specifying what had to be “true” — and Sketchpad would combine and 
solve all into an instant solution, which could then be further modified until the 
desired artifact/model was reached. It was thus not only one of the very first CAD 
tools, but also integrated SIMulation to vet the Computer Aided Design. 


This was monumentally wonderful, and a breathtaking tour de force by a real genius. 


Warren Teitelman famously named one of the earliest great attempts to support the 
tension between the “whats” and the “hows”: DWIM (for “Do What | Mean”, 
implemented in a series of Lisps, for which he was also the main designer and 
implementer). 


He invented one of the earliest (and deepest) forms of UNDO to deal with many kinds 
of human errors. He did one of the earliest and best designs to realize something like 
McCarthy's “Advice Taker”, and a programmer could do quite a few important things 
by “Advising” the Lisp system in progress. 


It's worth noting that the “catch” in declarative approaches is that it is very difficult to 
create a declarative language that will guarantee that anything that can be said in it 
will have a solution and that there will be enough solutions to make using it to be 
worthwhile. (Consider that you can say many things in math that are hard or 
impossible to actually reach, and that for some of them, the system will not be able to 
determine some of the ones not reachable.) 


To this end, one of niftiest things that Don Knuth did when devising TEX was to 
design the constraint language so that most of what was reasonable to express in it 
was also within the reach of his solver (a form of “dynamic programming’). 


An early example of a purely declarative programming system was the old IBM 
“Reports Program Generator” (RPG), an attempt to provide a completely declarative 
system for “producing business summaries”. It worked well enough to survive for quite 
a few generations, but started losing out to COBOL almost immediately, just because 
COBOL could “do more”. But this was at the expense of the creation of vast and 
terrible COBOL programs, many of which are still plaguing businesses (and costing 
fortunes) to this day. 


SQL was an attempt to be “sort-of-declarative” to allow desired sets of elements to be 
specified more or less “declaratively” and have the system figure out the actual 
operations to accomplish these. To me this is a parallel to FORTRAN having declarative 
expressions for arithmetic and comparisons, but having much of the rest of the system 
deal with "hows" rather than “whats”. 


It's worth noting that a good spreadsheet system is “pretty declarative” and it is fun to 
see just what can be done in the large with a more general version of the idea (quite a 
bit actually). | have written about this here and there over the years. 


A very interesting “middle ground on the side of declarative programming” was the 
“recursive functions of conditional expressions” that is pure Lisp. There is some “how” 
thinking in the planning of the form of the declarative expression, but the results were 
often quite elegant and could be “read declaratively”. This was also the case for the 
later Prolog: many of its programs could have a perfect declarative reading, but most 
of them had to be carefully planned so that Prolog could execute them successfully. 


Still, | think there is virtue here. It is very like requiring the writer of an expository essay 
to do quite a bit of work to have it turn out to be readable and appear simple. And 
this gets us back to the design and the debugging of the design issues that continue 
to be so difficult for many in computing to face up to these days. 


Another middle ground that hasn't been explored deeply enough over the last 25 
years or so, are attempts to create “runnable meanings” (aka “runnable math in the 
large") to be able to nail down desired semantics — the “whats” — with a complete 
separation from the optimizations — the “hows”. Unit tests don’t really capture this, 
unless a running system can be synthesized from them. It’s worth looking at the Nile 
language by Dan Amelang to see a modern version of “runnable math” which has 
great power of expression, and which was able to allow formerly large systems — such 
as all of the 2-2.5D rendering of personal computing to be programmed in less than 
500 lines of clear and understandable code (its “mathematical entropy"). 


Another idea with “runnable math” is that the meanings can represent the 
CAD<->SIM cycles of deep engineering, to create running models (perhaps requiring 
supercomputers for real-time requirements), and the FABrication part is done as a 
parallel organization of methods, heuristics, strategies and tactics, etc that are 
required to actually deploy the system in a practical way. The basic idea here is that 
you should be able to turn off any and all optimizations and the only thing that might 
happen is the system might slow down and use more storage, etc. 


Next, we should consider something like the “TurboTax” version of programming. This 
doesn't require the programmer to be god-like, because it asks questions and 
provides formats for answers that it can understand, and then synthesizes a program 
from what it has gleaned. An extremely impressive example of this a few years ago 
was a system done by CYCorp for the Marines to automatically write official 
procurement documents (these are enormous, and have so many requirements that 
only a few people in the Marines could write them — the backlog was crushing). One 
of the subtasks in this system was a wonderful system that normalized 20 different 
data-bases of different kinds and vintages so — for example — terms like “employee” 
which had different meanings in the data-bases could be rendered into a canonic 
meaning. This subsystem was presented to the rest of the system as a single data base 
with a highly semantic way to query it, etc. 


In many ways, systems such as the above are likely to appear more and more, and 
they will likely do a better job of holding their territories than RPG did 60 years ago. 


One way to look at a great leap forward would combine a carefully designed language 
for “whats” with a vast collection of "hows" and “solvers” and have a mediator — some 
future version of CYC? — that can act as a “semantic organizer” and “semantic 
operating system” — to find and organize the “hows and solvers” into larger 
composite solvers for given requirements. (It is quite interesting to reflect that this is 
just what Sketchpad did as one of its many amazing properties: it had three different 
solvers, plus code to decide which ones to use and how to retreat to more general 
and slower ones to eventually get the job done.) 


| hope | don't sound like a complete old fart and fuddy-duddy if | note that the taste 
for these really hard, semantically rich, problems seems quite lacking in the field 
today, and especially in the universities where one would hope that the funding, 
encouragement and freedom to deal with hard problems would still strongly exist. 


G& Alan Kay 

A Still trying to learn how to think better - Updated 2y 

Should schools be remote learning only until fall 2021? 
New section added Jan 29th 2021 


The answer | gave below, unfortunately resembles statements like: “if everyone would 
just 100% isolate and wear masks, we would soon end the pandemic”. It misses most 
of what is actually going on, and what is needed to deal the even larger problems 
brought by inadequate responses. 


For example, under most circumstances, “English as a second language” children need 
a strong external environment that features English as an everyday means to 
communicate. And most children are “mathematics as a second language” children, 
etc. This can be learned in some of the ways | mention below, but in the practical 
actualities of today, it is not going to happen for most children. 


But | still think it is quite wrong at this point to put in danger the adults involved with 
schooling in physical schools, even if it might be the case that children spread the 
disease less. 


Some children will not have access to computers hooked to the Internet. | think there 
are possible solutions for them at home that combine printed materials and cell 
phones — and that using more printed materials is an important idea even if there are 
computers available. 


The key principle between now and the time when it is safe for children to attend 
physical schools again, is almost certainly “structure” that is supported by media, 
much of which does not have to be online. But | do think that critical parts of the 
needed “structure” will require at least cellphones or landlines of some kind. | don't 
see how to do this without some parental cooperation (and | think this puts many 
children in a terrible situation). 


| shall try to answer what lies behind this question — which in part has to do with how 
learning can be done, be aided, and most especially shaped for children. 


First (with a molecular biology degree lurking in my misspent youth), even with the 
advent of vaccines, it is very likely that extreme care will need to be taken with the 
pandemic in the US and the UK until July 2021 or later. 


Second, it is very important to realize that the disaster many countries have faced 
(and will continue to face) was to a large extent avoidable. The virus has to have 
human hosts to reproduce and a means of getting from human to human. Denying 
these for about 6 weeks over an entire population will stamp out the threat. 


Only a few countries were able to face up to these facts (which anyone who has 
understood their 8th grade biology course should deeply understand). For example, 
New Zealand did bite the bullet and face up to a massive solution, and wound up with 
just 25 deaths (about 5 deaths per million). For comparison the US in just a few weeks 
will reach 1000 deaths per million — most avoidable — and this is still rising. This 
factor of 200 in deaths represents a kind of “ratio of ignorance” and lack of thinking 
skills. 


Simple Answer: This almost certainly means that the prognoses of experts such as Dr 
Fauci — that things will not start getting back to “real normal” until the third quarter 
next year — are likely to be correct — and that means that children should not be 
attending school until Fall 2027. 


Third If we include recent poor decisions with regard to the climate and with regard 
to what I'll call the systems integrity of a country (including politically), we can see that 
we've had a massive failure of public education, which has produced a majority of 
adults (some of whom are also parents, teachers and politicians) who are currently 
quite incompetent with respect to the time in which we live. 


Fourth, despite that children do most of their learning outside of school — especially 
at home — and genetically are set up to model their parents, too many of the parents 
have decided to outsource the main parts of modern education to schools (and are 
generally happy to have the schools also work as baby sitting processes as well). 


Fifth, the outsourcing also has a larger public purpose: the two main reasons for 
having a universal public education are highly complementary (a) to help create adults 
with enough knowledge and perspective to be able to be voting citizens in a 
representative democracy, and (b) to provide children with high level learning 
experiences regardless of the background or predilections of their parents. 


A high quality educational experience for children should even things out, but the 
current situation is not very high quality, works for some, but many of the differences 
are made up according to the backgrounds of the parents. Here is a recent study at 


Northwestern that looks at the pandemic’s effect on public school across different 
cohorts of the society. https://faculty.wcas.northwestern.edu/~mdo738/research 
/ADSZ_Covid_1220.pdf @ 


A Larger Answer My premise in what follows is that health — which includes “mental 
health — is the top priority, and that most other needs will eventually be aided more 
by good health in all areas than bad. 


But there is also the rock and the hard place, in that many parents (a) are currently 
caught up with multiple jobs and are struggling to get by (b) have used the schools to 
free up their time, and/or (c) are not well prepared themselves in many subjects, and 
(d) nor are they prepared for actually helping their children learn. 


The practical real question now becomes: how can we help every part of this 
quandary? 


I think putting “Children First” is a good place to start from (even though they depend 
on their parents etc for surviving each day). 


When | was a child in the 40s and 50s | was impressed by how my two cousins 
(contemporaries in age) did “schooling” while part of a family that sailed a large 
square masted sailing boat around the world for 18 month voyages. This was done 
through the “Calvert Correspondence School” and worked well enough for one of 
them to go to MIT and the other one to Amherst College. 


The process back then involved reading and writing, with “motivation” and a bit of 
help supplied by the parents. 


The reading and writing had much in common with the “tutor” methods of the elite 
British universities, where it is all about the learner doing most of the work, with 
feedback and guidance from the tutor. 


Most of the help came from the replies of heavily annotated versions of their own 
writings from the Calvert teachers/"tutors”. The parents would help interpret these and 
keep the kids on track. The process was very efficient (as most organized home 
schooling can be): no more than 2 hours a day is required to deal with much more 
material much more efficiently that most schools can or do. 


In non-pandemic times, those who home school can then spend much of the rest of 
each day exploring museums etc, reading for fun (if you don’t read for fun you will 
not develop the chops to read for purpose), clubs & sports, etc. for social contact, 
etc. 


Back to the sailing boat: of interest here is that sometimes an exchange would take 
more than a month by mail if the boat e.g. were in the South Seas. This did not hinder, 
and often seemed to help (like Bruner's “Spiral Curriculum” it provided a way to review 
ideas and take another tack through the materials). 


The question mentions “remote learning” — these days | think meaning “via computer 
that connects to something” — but did not say more about its form. The key idea in 
pedagogy is that there is no known way to pour new knowledge and skills in the 
learner's ear. the learners have to create these from the processes in the environment 
they are put in. “Typical school" is far from a good learning environment, and “remote 
versions of typical school” are likely to be much much worse. 


But we should also be able to expect/require parents to provide “guided motivation" 
for learning frameworks/processes set up by others. The UK has a national curriculum, 
but as far as | can tell, the current government has not taken upon itself to provide the 
parents with the materials they need to help their children learn, and whether 
pandemic or not, remote schooling or not. 


The US is set up to have the states to some level have a “state curriculum” that the 
many local school boards are supposed to follow. As an example here, the state of 
California — which has about the 6th largest economy in the world if it were a country 
— has not provided the parents with what is needed for helping their children learn in 
both normal and abnormal times. 


This comes back to the largest question of what to do if the parents aren't inclined or 
able to help their children, even through motivation, and it was the public schools that 
were making a difference for these children. 


And this takes us back to how | started this too long piece: the general inability of the 
public to understand the pandemic and to replace science with “hoping” and 
“projections of beliefs” has not only resulted in a health disaster, but also disasters in 
the many ancillary requirements, including children’s education — and much of this 
because of the failure of the existing education systems to better prepare these adults 
in the public to make much better decisions in the face of crises. (This is a more than 
worrisome downward spiral.) 


“Many important problems don't have a half-life’, so “amortizing them into the future” 
will eventually incur crushing reckonings. Most citizens balk at paying taxes to 


maintain roads, bridges, schools, and other infrastructure — and their health. But the 
“interest costs” — what the Mafia would call “the vig” — of the amortizations are 
huge, and many of these are in the process of coming due. 


The “interest costs” of producing ignorant unprepared “so-called-adult"-citizens hit at 
the direct heart of their society. The last years have revealed some of the dire 
consequences along a number of important dimensions. 


One of many things that needs to be done to start turn-arounds, is for the levels of 
government that are appropriate, to officially characterize all real schooling as “home 
schooling” (perhaps aided by schools) and to prepare the materials etc. to help 
parents really take seriously, and learn, how they can make a qualitative difference, not 
just for their children, but actually to wind up saving the entire world (do | have to 
again mention the still almost-invisible-to-most climate crisis whose import dwarfs the 
pandemic?) 


é Alan Kay 
Have designed a few programming languages « 2y 


What are some historic examples of elegant, streamlined operating 
systems? 

A fabulous super-efficient elegantly designed HW/OS was "Project Genie”, an ARPA 
Project at Berkeley in the mid to late 60s. Project Genie - Wikipedia @ 


It is most famous for three big uses of it. The first was that it turned out so successfully 
that ARPA wanted more, and more or less forced SDS to build a commercial version 
(the Wikipedia article is sanitized and inaccurate). This was used to form the Tymshare 
Corp, the first commercial time-sharing company. 


The second was that this machine and OS was the basis for the Engelbart NLS system 
and was the computer that sustained “The Mother of All Demos” in 1968. 


The third was that the design of the OS was copied by BBN and used to make the 
Tenex OS for the DEC PDP-10. 


A few notes here. They had a small SDS 930 computer (64K 24bit words, 0.5MIPs) and 
modified it to add their own home built memory projection and page relocation 
hardware, etc. Mel Pirtle was the main factor for this. Chuck Thacker (of later Parc and 
Turing Award fame) was one of the builders of the HW. 


The OS was quite ingenious for any day, and was primarily the design of Butler 
Lampson (also of later Parc and Turing Award fame). Peter Deutsch (of Lisp and later 
Parc fame), did much of the critical programming. 


It had several features that encouraged programmers to let the system know in 
advance — when they could — their intentions, and in return they would be rewarded 
by having their program run on the fast queue (and if they violated their predictions, 
they were punished by being put on the slow queue). 


A unique feature (I don’t know of any exceptions) is that the working set given to any 
process was only about 1/4 the available physical memory. This allowed the OS to 
bring in the next three processes from the disc/drum secondary storage and thrashing 
was pretty much eliminated. 


Another interesting wrinkle to the design is that Butler "designed the system to crash” 
— meaning: there were crashes of computers in those days, so instead of pretending 
that all was well, Butler took the opposite tack and set things up so that a crash could 
be recovered and resumed in a few seconds or minutes. One of the schemes was to 
mark pages as dirty or clean and to use a few percents of the scarce resources to 
preemptively write out dirty pages, so that there was almost always enough clean 
storage to bring pages in. 


There were many other interesting features for inter-process protection and 
communication, and the particular way the “user machine illusion” was presented to 
programmers. 


€ Alan Kay 


Have designed a few programming languages - 2y 


What kind of thought process do you need to understand programming? 


Even if you are programming in a declarative language, you are organizing a structure 
of interacting relationships whose realization on a computer will cause desired effects 
to happen. 


The next levels of thinking can feel very different, and to such a point that many 
“programmers” only choose a few ways to think. For example, classic imperative data- 
structure-munging programming thought is very different than classic recursive “Lisp- 
type” programming thought — the first is very direct, and straightforward, and also 
prone to many kinds of errors — the second is “strategic”, subtle, but also far fewer 
wrt bugs. “Real OOP” design and programming is rather different from either of these, 
but most good versions of it resemble the second. 


However, for the purpose of this question, | think the simplest dimensions to ponder 
are (a) the one that stretches from the tactical to the strategic, (b) the one that 
stretches from “tinkering” to “real engineering design”, and (c) the one that stretches 
from the ad hoc to the mathematical. 


A much more subtle, and ultimately all important, dimension is (d) the one whose rich 
direction points to “real science”. This is where the kind of thinking can be done that 
leads to “real inventions” and “really new perspectives” on what programming could 
mean. 


€ Alan Kay 
Still trying to learn how to think better - Updated 2y 


How do computers deal with images, written texts, videos, etc.? Does it 
transform them into 0 and 1 codes? 


Yes. A good way to think of this is that a binary digit — a bit — is not really a "0" ora 
“1” but just “off” or “on” (or “not there” or “there”). In other words, it is like a simple 
mark on a wall or paper. So a bit is more intuitively represented by symbols that look 
more like “not there” and “there”, for example: 0 and @ 


It's the combination of “there”/"not there” that provides the way to represent ideas. 
Also see (Analogy) below for an analogy to markings on paper. 


So, eight of these together can signify 256 different ideas (there are two ways to pick 
each of the eight “bits”). 


The way to think of this is like a dictionary, where the pattern of eight bits is like a 
word to be looked up, and something else (often us) has the meanings. 


These 256 patterns could be a representation for 256 different numbers, or 256 
characters in an alphabet, or 256 colors (or just 256 shades of red), or 256 levels of 
volume for a sound, etc. 


When we want to use a sequence of bits to represent a number, the order of the 
patterns is chosen to make both comparing (larger or smaller) and arithmetic (+, -, 
etc) as easy (and as fast) as possible for the hardware. 


Just for picture elements (“pixels”), usually at least four of these patterns made from 8 
bits are used: one for each primary: for 256 shades each of red green blue, and the 
last for 256 degrees of transparency. 


Visible text is made from a combination of a pattern for the alphabetic character — 
for example, the letter “a” — and a pattern for the visual representation on your 
computer screen — a picture made from “picture elements” as described in the 


previous paragraph (this might require about 1000 of them organized in a rectangle). 


Similarly, a sound is made from a sequence of “volumes” (“amplitudes”), typically 
around 44,000 (or more) for each second. 


How many bits need to be used for each amplitude (that approximates the relative 
height of a wave form at that point in time? 


Using 8 bits — so 256 heights (+ or - 128 heights from zero) — can work but will also 
introduce some noise. 16 bits (+ or - 32,768) can often work, but still can have noise 
(and there are many tricks for trying to make this work). A stereo channel would 
contain a sequence of these for each ear. Today, 24 or 32 bits (or more) are used to 
represent the amplitude for each “sound element” in each channel to get the noise 
level so low it can't be heard (sometimes these will be carefully reduced back to 16 
bits to save space). 


(Analogy) A way to reconcile all this to the world of human senses is to note that all 
we need is paper and the presence or absence of ink to represent words, pictures, 
music, etc. What's on the paper are patterns that are supposed to signify meanings 
that we can “look up" in our minds. The computer versions of these are quite similar, 
with the really interesting and added property that the computer can read and write 
its own markings (to make “a book that can read and write itself’). This is extremely 
important. 


€ Alan Kay 
-- 1 Have designed a few programming languages - 2y 


Since programming ultimately comes to controlling computer hardware 
(even if we don't usually think of it that way), how is it that the same 
programming languages work on many different type of computers with 
different hardware? 

A good way to start thinking about this is that most computers are “universal”: 
meaning that given enough memory and time, they can compute anything that can be 
computed. This means that — for any imagined effect — there exist combinations of 
each computer's hardware instructions that can cause that effect. 


For example a computer might not have a multiplication instruction, but if it is 
universal, a combination of the instructions it does have can be contrived to multiply 


two numbers together, 


In the large, this means given enough memory and time, any computer can simulate 
any other computer “to the bit”. 


Most programming languages aim to present a “convenient computer” for humans to 
program in. The features are in terms of *ideas* small and large — for example, most 
of them will provide numbers and a multiplication operation. 


The job of the program that translates from the *ideas* to a particular computer is to 
make sure that the meaning of each idea is rendered accurately using the low level 
instructions for each computer (these can look very different, but their *effect* will be 
the same). 


In computerish terms, the “front-end” of a programming language will look the same 
from computer to computer, but the “back-end” will be specific to each specific 
hardware used. 


. .\ Alan Kay 
@ Lam the Alan Kay in question. - 2y 


What is Alan Kay's MBTI type? 


Astrology posits 12 “types”. But this particular sorting doesn’t work well, so “epicycles” 
are introduced, and these allow pretty much any observed fact to be rationalized to 
the system. 


Myers-Briggs is more interesting in a number of ways, but also eventually introduced 
“epicycles” e.g. the blending of the oppositions to allow more explanations to seem to 
work. Similar kinds of paragraphs to astrology, etc. 


All of these are mainly part of the human need for resonant explanations (even if the 
explanations are just stories and not in accord with close observations) that has been 
part and parcel of cultural/religious practice in human history. 


The “psychometrics” *idea* has more worth, if it can be used only for what can be 
done with it so far. (And many might be surprised to know that one of the inventors of 
psychometrics — besides Carl Jung — was Timothy Leary.) 


For example, if you stick strictly to just 4 dimensions and oppositions, you will wind up 
with 16 categories. These aren't enough to deal with normal ranges of human 
variation, so do they provide any help at all, or is this just another astrological scheme 
in modern dress? Or a different form of tribalism with overgeneralizations that can be 
dangerous? 


One “type” that has been examined in the 20th century (and recently) is “the 
authoritarian type”, and | think there is something to this. Myers-Briggs devotees 
might be able to find a way to squeeze this into MBTI, but it seems like a real 
dimension of human personality — and it fits with William James’ notion in the 19th 
century of “tender-minded” and “tough-minded” types. | don't think this quite works 
with e.g. P/J. 


My conclusion is that these ideas — as with many anthropological and genetic ideas 
— can be useful if packaged with considerable sophistication, but they are likely to be 
dangerous otherwise. 


One technique for “sophistication” is to collect different perspectives on an idea. One 
of my favorites is the book “Maps of the Mind” (Charles Hampton Turner), with 30 or 
so different views. Some of these might be more accurate than others, but it is the 
collection of them all that really helps thinking about the complex processes between 
our ears (and to some extent in other parts of our bodies). 


Along these lines, in a comment below, Robert Mudry gives us a link to another trait 
analysis. | was vaguely aware of this from decades ago but had forgotten it. If you put 
it in collection with other schemes, the group of perspectives does help think about 
and talk about this complex subject: Big Five personality traits - Wikipedia @ 


Alan Kay 
Have designed a few programming languages - 2y 


Since everything can be represented by objects, what is the successor of 
Smalltalk? What system could be comparably minimal, yet more powerful? 


Kernel systems that are written in themselves — like Lisp or Smalltalk — can morph 
into any new idea and claim that “the potential” was always there ... 


But just to start readers thinking about this, a number of the “details” of how Smalltalk 
was implemented could be vastly improved today. 


Just to name five of quite a few ... 


e the messaging apparatus could be made to more easily deal with the many 
kinds of messaging “negotiations” that are needed when things are scaled up 
considerably from the size of things we dealt with at Parc 


¢ The internals of objects and their interrelationships could be much more like 
(a) the address spaces they were intended to be, and (b) to recursively hold 
systems of objects in communication (c) to minimize dependencies at every 
level 


¢ amore flexible approach to the forms of messages (and hence to the syntax 
of the language) could be devised — | would look at “coordination 
languages" (like Linda) and “relational languages” (like Concurrent Prolog) for 
inspiration 


* amore semantic and specific approach to “time” could be installed: for 
example along the lines of McCarthy's “fluents”, Reed's (and Croquet’s) 
pseudotime, etc. This could also be used to deal with a much larger and 


comprehensive set of ideas about “doing” and "undoing", “possible worlds” 
reasoning, etc. (along the lines of “Worlds” by Warth and Ohshima). 


¢ astronger approach to meaning, ontologies, etc. (a) defining meanings, (b) 
running meanings, (c) separating optimizations from meanings 


And there are more. 


Such a “Smalltalk-20" could be gradually made out of one of the existing Smalltalks 
that has full reflection and something like SLANG in it to bootstrap the kernel (e.g. 
some Squeak derivative). 


€ Alan Kay 

<< Have designed a few programming languages - 2y 

What are the five features of the object-oriented paradigm that you 
consider to be important for good software engineering practice? 
“Engineering” is the *principled* process of creating working systems that are in 
accord with both our immediate goals and requirements, and also in accord with 
contextual requirements (such as integrity, safety, predictability, maintainability, etc). 


Putting “Software” in front of “Engineering” is a bit misleading, since it is the latter that 
has to dominate in order for the two terms to mean something in combination. So | 
like to use EOS ("Engineering Of Software") instead. 


Systems are often described in terms of parts relationships connected by 
intercommunicated interactions (because most systems have some non-linearities that 
don't allow analysis into completely independent parts). 


Much of design has to do with finding parts relationships that allow (a) the 
interactions to be minimal (on the one hand), and (b) to allow the interior complexity 
of a part (usually also expressed as a system) to also be as minimal as possible. 


This suggests some useful tools: 


¢ apart” construct that can protect its inside from its outside, and vice versa 


¢ a"communications” construct that can convey interactions and deal with 
dependencies 


a "system" construct that is a combination of parts and communications that 
can fit in a part recursively, and everything is made this way 


e the “messages” that are communicated are also in terms of the systems 
e the system that is being made, is made with the same kinds of system 


¢ changes can be made safely to the system without having to rebuild it and 
while it is running 


e and so forth 


These tools allow some useful abstractions to be devised, including: design of 
algebras of communication (now called polymorphisms), various approaches to 
ontologies using analogies, composition, etc. 


These also count as tools, but live ina more complex world of human goals, 
intentions, ways to describe things, etc. 


Notice that all these only help design by being able to encompass what designs and 
systems and descriptions are. They have the same kind of usefulness as language, 
mathematics, and other kinds of media. They are hugely useful, but the designers still 
have to understand how to organize the materials to create “art” both with a small “a” 
and a capital “a”. 


The “Oop” that was done in the ARPA/Parc community (e.g. Smalltalk) was just such a 
collection of tools. We still had to do a lot of design, but it was able to represent our 
designs without too much added noise, and this allowed the iterative processes of 
designing and building to be done relatively easily to result in lively small robust 
evolving systems that were close to what we intended. 


Some of the things not carried through as far as they could be had to do with dealing 
with large scalings (e.g. this can be done better in Smalltalk than we did, but our 


priorities were to get maximum expression within the tiny computing resources we 
had in the 70s, and these aspects worked out well with our approach). 


€A Alan Kay 
{I'm the "Alan Kay" in question (try Google for the usual misinformation) - Updated 2y 


How much computer science knowledge has been lost because it was not 
published or documented, or no working code has been shared? 


If truly lost, then we can't know enough to answer the question in any era. 


Given that the vast majority (this is an estimate from personal experience) has never 
taken the trouble to even type "Engel bart” into Google and look closely at even 
the first five hits, | could ask “What does it matter if knowledge is being lost, if the 
knowledge that hasn't been lost is being ignored?” 


In the US, this is similar to the free public library problem: great knowledge is in there, 
but you have to step in the door and get to work. This is particularly painful because 
the source of many of these public libraries was Andrew Carnegie, and part of the 
funds donated required each library to have at least two special rooms: one for the 
teaching of reading, and one just for children. 


Both of these were because Carnegie was a child laborer sent to the US who had the 
good fortune that one of his bosses opened his home library on the day off of his 
laborers so they could make use of the books there. Carnegie educated and grew 
himself there, and never forgot. 


This is worth comparing with the almost blind building on top of the Internet by 
people who (a) don’t know about Carnegie libraries, and (b) don’t realize that part of 
what needs to be supplied are the modern equivalents of each of the two special 
rooms in every Carnegie library. 


Andrew Carnegie had context and helped others to acquire it, but the web today — in 
a memorable phrase — is “the context of no context”. 


€A Alan Kay 

<1 Have designed a few programming languages : 2y 

What is Seed7 programming language about? 

With programming languages, quite a bit of assessment should be done by writing 
various kinds of systems (and especially “reference systems”). | haven't done that, so 
I've little to say here. 


| did find that Seed7 is closely based on the theses of Thomas Mertes, and | do 
remember (to some extent) these from the 80s. 


| think there are three or four “ideas worth contemplating” in the aims of this design, 
one of them being to try to incorporate a useful kind of syntax extension along with 
the semantic extensions gotten by various kinds of modules — e.g. subroutines, forms 
of objects, etc. 


Most extensible languages — including the ones we did at Parc — lack “fences”, which 
should require some form of key to be allowed to use certain kinds of extensions (that 
can be very useful, but also create both dangers and messes if used willy nilly). 


It would be great to see a modern extensible language done in the light of what is 
actually known today about what is needed. 


€ Alan Kay 
“Lam the Alan Kay in question. - 3y 


Does Alan Kay know about Brightmark? The company that is taking 
recycled waste and making renewable fuel. The company already partnered 
with BP. If not, | hope this reaches him. 


Yes, | am aware of Brightmark, but not in detail. 


For the last several years | have been helping The Ellen MacArthur Foundation of the 
UK, a non-profit that is highly concerned with the climate, waste, circular economy, 
etc. Circular Economy - UK, USA, Europe, Asia & South America - The Ellen MacArthur 
Foundation @ 


The world is almost 60 years late addressing these deep issues (first scientific climate 
warning by NSF was in 1963), and only a fraction of what is needed is being done 
today. 


€ Alan Kay 
Still trying to learn how to think better - 3y 


I've been quite interested in playing the lyre, but | have no knowledge on it 
and since it’s not a very popular instrument, there aren't guides. My 
question is, what's a good amount of strings? | see 7 string, 10 string, 16 
string etc., so which? 


There are guides on YouTube, and some very good examples of different kinds of 
lyres. There is a family in Greece that makes historical replicas. Take a look. 


€ Alan Kay 
*— 1 Lam the Alan Kay in question. - Updated 3y 


Does Alan Kay know or even have met Russell Ackoff on Idealized Design? 


Interesting question! 


My introduction to systems thinking outside of computing came from a few science 
fiction books in the 50s, reading a lot of the Cybernetics literature, being a molecular 
biology major in the early 60s, and then reading quite a few books that were listed in 
the "The Whole Earth Catalog", mostly classics e.g. by Bertalanffy, Waddington, and a 
now classic “Notes On A Synthesis Of Form” by Christopher Alexander. 


Within computing, complex systems design was a hot topic in the 60s for operating 
systems, computer networking, and large complex applications e.g. Sutherland's 
“Sketchpad” system. 


There were a few emergent languages in the 60s that took a systems and simulation 
view — for example, Simscript, Simula, SOL, and Sketchpad again. 


| think the first systems thinker who was outside of computing or cybernetics that | 
met was Peter Senge, and he operated in the same general “management guru” space 
that Russell Ackoff did. However, | did not read any of Ackoff's books back then or 
meet him. As far as | can tell, none of his early books were mentioned in the Whole 
Earth Catalog. 


As a result of this question, | got and read three of his essays, and ordered the 
“Idealized Design" book (and will report on this after looking at it). Ackoff's essays 
were good. The early one 1971 for the journal “Management Science” (a very 
“interesting” term all of its own), is expository, and should have been very useful for its 
intended audience (| seriously doubt that any but the tiniest fraction read and 
understood what he was trying to get them to see). The later two essays were written 
after decades of not being able to get business people to take systems views or learn 
about systems. They parallel Peter Senge’s experience very closely. 


| have a copy of Peter Senge’s excellent book “The Fifth Discipline” — mostly about 
systems thinking for management — here in London and could not find Ackoff as an 
entry in the index. 


OK, | have now gotten and read the “Idealized Design” book, and some other writings 
(Ackoff wrote many books and papers). | also found that my friend Peter Senge wrote 
a tribute to Ackoff after he died at age 90 in 2009. And that Peter Drucker also 
acknowledged Ackoff as a major influence in changing some of his thinking about 
management. 


There is too much to sum up for a Quora sized answer, but here are a few thoughts | 
had while looking at this material that was new to me. 


First, “Systems” should be more of a field than it is: it has a wealth of “phenomena of 
interest", and is manifested in many kinds of ways in the world, and at most scalings. 
As it stands now, “systems thinking” is explored and expressed at more local and 
restricted levels than it deserves. A particular local milieu is “management studies", 
where Ackoff (and Senge) spent much of their careers both studying and attempting 
to improve. We could add in Jay Forrester and "System Dynamics” as another branch 
of this within “management”. 


Second, the fundamentals of “Systems” have been rediscovered many times, 
sometimes with great insights and sometimes with less understanding. A real field 
could really help the foundations of this kind of research and thinking. 


Third, much of what's important about “Systems” and “Systems Thinking” has fallen on 
deaf ears pretty much everywhere in the wide variety of localized environments that 
have sprouted discoverers and proselytizers. This includes not just “management” but 
also to a surprised extent: “computing”. In all areas, even when there is evidence that 
some people know better, sound principles are violated for reasons of expediency and 
with the *hope* that the side effects will not be too damaging (these hopes have 
generally not been fulfilled). 


Basically, most people want to be able to think about things separately, as though 
there are no connections to other things. And they are all too willing to allow large 
scale (even planetary scale) death and destruction to be part of the consequences (e.g. 
the pandemic, the climate, etc.). 


Similarly, the “Idealized Design” book is about a principle that is well known in design, 
but is so often not followed — again for many human reasons — that it was worth 
writing a book about it (and again aimed at management). An actual Design field — 
which would be a sibling of an actual Systems field — would have supplied the authors 
of this book with an even stronger set of examples to try to get management people 


to understand (for example, the book was apparently published in 2006, but had 
essentially nothing to say about the interactive, personal, networked computer 
revolution, even though many of the best inventions were indeed teased out via 
“Idealized Design" going all the way back to the late 50s and early 60s. 


One way to lift all the areas with regard to what they leave out, would be to try to help 
practitioners to put some of the effort they now put into quick local solutions to 
simple seeming local problems, into ramifying out possible and likely consequences 
into larger environments and longer time periods. 


€ Alan Kay 
1 Still trying to learn how to think better - 3y 


Would artificial intelligence be possible if computer programming was not 
invented? 


Here's a somewhat different perspective on this. 


“Artificial” means something *made* (usually by humans). “Intelligence” is more 
difficult to pin down — humans like to think that they are, so let's take an informal 
view of this. 


So “artificial intelligence” could be construed to be “processes using methods, that 
exhibit ‘intelligence’ " — especially “great intelligence”. 


Now let us look around for such processes, either in existence or as ideas. 


| have a decent sized list of these, but let me just put forth *Science* as one of the 
most interesting and powerful examples of “artificial intelligence”. 


| will be happy to explain this idea in response to comments, but | think it will be more 
useful and interesting to start off with as a “ponderable”. 


And then many of the other things on my list will be easily discoverable. It’s very 
worthwhile going back in history (and prehistory) to think about the earliest human 
inventions of “artificial intelligence”. 


€ Alan Kay 
‘I'm the "Alan Kay" in question (try Google for the usual misinformation) - 3y 


Have you ever been asked to teach someone how to program computers by 
a person with no such background and how did you respond? 


This hasn't happened for some years now, but I've always tried to help them (we all 
started “with no such background’ ...). 


The two main parts are “what does it mean to do something at all”, and “what does it 
mean to make and interact with systems”. * 


* What follows, | first put in a reply to Petar’s comment below, but | think it should be 
part of this answer: 


| have always done my work in the context of a group that is set up to maximize a 
wide spread of talents and abilities. Just as “science” is “a better scientist than a 
scientist”, such a group is “a better programmer and systems designer than any 
individual". 


Really learning to program — etc. — is really learning about how to be part of a 
scientific/engineering/design team. As with a sport like basketball, there are lots of 
fundamentals that all need to do extremely well, and there are also “special abilities” 
that not every member will have, but which the whole team will have — plus the 
extras of synergy — when the group is functioning well as a team. 


€ Alan Kay 
<7 Invented and designed a few user interfaces ... : Updated 3y 


How was the first graphical user interface (GUI) made on a computer in the 
absence of any other graphic software? 


I'm writing this because the 4 already existing answers — well meaning perhaps — 
range from being “pretty wrong” to “very wrong”. They all appear to have been done 
without any checking of history (and are perhaps mostly "just made up” from scraps 
that the authors have come by). 


To start with, we should at least run down one or two candidates for “the first 
graphical user interface (GUI)" as the question requests. Both of the “earliest” (aka 
“very early”) programmable computers, ca 1949 — the Cambridge EDSAC and the 
Manchester Mark | — could show pictures on CRT tubes, and both pretty quickly 
implemented games using them — TIC TAC TOE on the former and checkers on the 
latter. But neither had a pointing device. 


So, let's restrict ourselves to something that both has a display and some kind of 
pointing device, where something we do with the pointing device affects something 


on the display via the computer interpreting the inputs and acting accordingly. 


One of the earliest pointing devices was the “light gun”, invented by Bob Everett as 
part of the MIT Whirlwind computer development starting ca 1948, and starting be 
operational ca 1950. 


Whirlwind | control room with CRT display ca early 50s 


The Whirlwind display was essentially an oscilloscope CRT, and the hardware 
command was an instruction that would put up a dot at an x-y location. 


The “light gun” had a photocell that could see a dot on the screen if it were there, and 
cause an interrupt. A picture of some kind — such as a character or a line — was 
made by progressively plotting points. This could be “seen” by the light gun, and by 
looking at the code after the interrupt, the dot that was seen could be identified, and 
then the structure the dot was part of could be deduced. 


This was used in Whirlwind I] — the immense SAGE air defense system computers in 
the 50s 


A SAGE console from the mid-50s showing the CRT and light gun pointing device 


The experimental SAGE computer also used this scheme: the Lincoln Labs TX-2, which 
is most famous for being the computer that Ivan Sutherland used to “invent 
interactive computer graphics” as the large idea we have today via his “Sketchpad” 
system, which was done during 1962. Sketchpad also had a real “Graphical User 
Interface”, so it could be claimed to be “the first”. 


The TX-2 computer also plotted points, but now had a single instruction — in a 36 bit 
word — that contained the x-y address where the point should be plotted, and the 
address to jump to if the light pen saw this particular point. The TX-2 was perhaps the 
first computer to have many instruction counters that allowed processes to be 
interleaved without requiring conventional interrupts. 


Drawing a rivet using a light pen in Ivan Sutherland’s Sketchpad 


Ivan was not the first to create a whole graphics system in software from a HW point 
plotting instruction, but what he did do was so monumental compared to the 
rudimentary line drawing efforts, that he is universally regarded as the father of not 
just “interactive graphics”, but also of the “graphical user interface”. 


So: as with most things regarding computing, just the tiniest bit of HW/SW connection 


to I/O plus SW is enough to make a GUI. The key ingredients are insight and design. 


Alan Kay 
Still trying to learn how to think better - 3y 


Why is it so important to invent the future from the future rather than 
incrementing from the present? 


Incrementing from the present and inventing in the future both have their useful 
niches. 


My research community — ARPA/Parc in the 60s and 70s — was aimed at qualitative 
changes that were quite distant from increments, for example: the Internet and 
Ethernet, personal computing with bit-map screens and pointing devices, 2D and 3D 
graphics and graphical user interfaces, virtual reality, dynamic object oriented 
programming, laser printing, outline fonts, and many more inventions, each of which 
created new categories and often new industries. 


So ARPA/Parc was interested in the extremes of “invent the future from the future 
rather than incrementing from the present”. 


For these kinds of visions and goals the “present” — and the past that made it — are 
so vivid that they make it difficult to think about anything that isn’t already there in 
some form. 


The tools in the present are also a big problem. With big new ideas, a lot of work will 
have to be expended on workarounds, and this will slog down and perhaps kill 
progress. 


If computers are involved, all the above applies, with the addition of possibly being 
able to do enough optimization to get weak HW and SW to still do something new. 


However, once you take off your design hat and put on your optimization hat, it is 
very difficult to impossible to do further design — it is especially difficult to change 
your mind while trying to invent (and this is usually necessary). 


So, psychologically, we need to somehow eliminate the present as much as possible. 


Part of the spirit of the ARPA/Parc community came from some of the previous and 
enormous WWII invention and engineering efforts. One of the principles was to invent 
and make everything from scratch if that was called for. This was carried through to 
the Cold War science and engineering, and for computing, especially the SAGE air 
defense systems of the 50s. The scale of all of these was titanic. 


| wrote a white paper last year for the Ellen MacArthur Foundation in the UK — called 
“How?" — that was about this, and I'll put in a link for this later (thanks to Phillip 
Remaker for finding this link for me: https://internetat50.com/references/Kay_How.pdf 
7). 


The culture created by this was that young people in the ARPA/Parc community were 
immersed in thinking about “What is actually needed?” without worrying overly much 
about how difficult it might be. And certainly not to worry about existing machinery, 
vendors, tools, etc. 


The result was quite a bit of building scratch HW and SW systems when needed, 
sometimes taking an existing supercomputer and rewriting all of the SW so it could be 
used by a single user, sometimes making new mainframes or personal 
supercomputers, etc. This included new I/O gear such as new kinds of displays, 
pointing devices, printers, etc. 


In order for this to not just be a bottomless pit/black hole, quite a few skills had to be 
highly developed to pull off all the inventions and tool building in a timely fashion, 
etc. 


A general feeling towards the end of the 50s — partly from the appearance of the first 
integrated circuits — was that the supercomputers of the present would likely be 100s 
of times cheaper and smaller in a few years (maybe a lot more). By the middle of the 
60s Gordon Moore started publishing analyses of where ICs could possibly go, and 
this optimism (“a lot lot more!") fueled the ARPA ideas about “working in the future”. 


The basic idea was to maximize and elongate degrees of freedom and to be able to 
work as though one was already in the future (so that the kinds of optimizations that 
needed to be done would be as minimal as possible, and not cripple the invention 
processes). 


This was especially critical for ARPA/Parc because it was all about personal computing 
that was connected so groups of people could collaborate (Engelbart, etc.). This meant 
that “the future in the present” had to have lots of users of the future systems and 
these systems had to have sufficient real-time responsiveness. 


For example, at Parc we invented, designed, and built almost 2000 “personal 
supercomputers” in the early 70s that had the computing power of the commodity 
computers in the middle 80s and later). 


Similarly, the ARPAnet and Internet inventions used some of the wires of ATT to carry 
bits, but invented and built all of the new kinds of packet switching systems and 
software needed, including the first routers, etc. and put them on top of the ATT wires. 


So: 


“Go into the future to invent, and then bring it back — and this means you have to 
make the future to go into where you can invent the future”. 


Sounds like a paradox — and it is not generally done today — but take a look at the 
results ... 


Alan Kay 
{Still trying to learn how to think better - 3y 


Has the real computer revolution happened yet? 


I've answered versions of this question before on Quora. 


In my research community — ARPA/Parc in the 60s and 70s — “computer revolution” 
meant something like the next qualitative leap from the “writing, printing, math and 
science” revolution. In other words, something on a scale of hundreds to thousands of 
years. 


| think the general sense of “computer revolution” today has to do with the 
pervasiveness and convenience of technologies that for most people can imitate 
various forms of old media, and distribute them more widely. 


If we try to shape future qualitative leaps that are reasonable analogies to the big 
deals from and about “writing, printing, math and science”, almost nothing that most 
people learn and do today matches up. 


So, a kind of “computer revolution” has happened — and it is having consequences in 
the general public — but the one we were sure was possible — and partially invented 
— hasn't happened for the public (for whom most of the previous revolution of 
“writing, printing, math and science” hasn't happened yet either). 


€ Alan Kay 
‘Still trying to learn how to think better - Updated 3y 


How was version control handled in the early days of PARC? 

“In the early days of Parc” would be when there were Altos, Ethernet, Laser Printer 
service, but not yet a comprehensive set of remote file services. (I'm leaving out here 
the simulated PDP-10 mainframe — “MAXC" — that we built, which had quite a few 
versioning features, because these came with the “Tenex” OS.) 


Each Alto had one or two “Model 30" disk drives with 2.5 MB disk packs. Some of the 
systems run at this time did carry out forms of versioning. 


For example: "Bravo", the WYSIWYG word processor that became MS Word in the 80s, 
did both automatic file versioning and event replays from checkpoints. 


Smalltalk kept an automatic history of “changes” to code and could revert. Its virtual 
memory system also did automatic and invisible checkpointing every 20 seconds or 
less to guarantee a pristine recent system state in case of any kind of crash. 


A later more elaborate — and terrific — fine grain dynamic versioning system was 
done for Smalltalk — called “PIE” (by Ira Goldstein and Danny Bobrow) — but this was 
not in “the early days”. 


Alan Kay 
“1 Have designed a few programming languages - Updated 2y 
Why isn't meta-programming more common in software development? 


There are 4 existing answers as | write this, each of which has something useful to say. 


“Meta-programming" at least means “programming the programming system” you 
are using to add features the programming system doesn't possess. 


Note that vanilla procedures can do this for many kinds of things in a language, so 
they furnish a kind of semantic extension. A granddaddy HLL — Algol — had two 
features that allowed this to be taken rather far (a) call by “name” and (b) a way to go 
beyond vanilla procedural syntax, but having an alternative to the comma. 


Call by name sends the unevaled parameter and environment. 


wn 


The alternative to the ",” was "): <any text>(". 
The first allows a conditional expression to be defined as 
if (a, b, c) 


where the b and c expressions will be sent unevaled *with their environment*, and 
then evaled or not *with respect to that environment* by the body of the procedure 
depending on the value of a. 


The second allows this to be written alternatively as: 
if (a): then (b): else :(c) 


Another way to think about all this is that quite a bit of program design these days 
winds up designing and defining APIs of one kind or another (including the interfaces 
to “objects"), and any library that will be used also connotes semantic extensions to 
the language. So it makes perfect sense to allow these to be cleaned up and made 
more readable via syntactic extension. 


Note that this can be done with macros, but the language has to have the quoting 
mechanisms to include passing environments as well as values (lambdas are one way 
to do this etc). Macros have the drawback that they are too syntactic (they are 
expanded as a “soft eval” at compile time), and thus the debugger has to be really 
capable to allow real-time debugging without confusion (most aren't). 


A better way to do all this is to start with a language that has deep reflective 
properties, and to consider anything meta as powerful and dangerous (meaning get 
really good before you start to mess with the meta-properties of your language — 
this is a different order of design). 


Our own experience with this is that "good programmers often make bad choices in 
the heat of debugging", and every extension puts a learning curve burden on every 
other programmer, especially if the extensions are clever and hide their 
parameterizations. 


The flip side is the same principle for tool making in general: you can get into deep 
death spirals trying to make your own tools, but if you do have the chops, then you most 
definitely should. These different orders of design just need to be heeded and the 
necessary care taken. 


The third general area in which meta-programming can often help is pragmatics i.e. 
optimizing. Typical programming styles often intertwine optimizations with the code 
that also expresses the meaning. This is a terrible idea, but is found everywhere. 


Quite a few optimizations are carried out by violating a natural module boundary and 
trying to get away with it. 


One can imagine designs in which the meanings are separated from the optimizations, 
where the latter can be turned off and the program will still work, etc. A language like 
Lisp or Smalltalk can actually allow features like these to be added to the language. 


A language could even have module mechanisms that confine the violations needed 
for optimizations. A good language will allow these features to be made via meta- 
programming. 


Another — related — feature that is both semantic and pragmatic is “trial evaluation” 
— to do this correctly, all side effects have to be restored on an “UNDO" or “backup”. 
Note that “trial evaluation” can also lead to “histories of versions” at different 
granularities — this can make a huge difference in both building and maintaining 
systems. 


A really good framework in a language for metaprogramming will have every feature 
of the language internally as a reflective model already, and will have a late-binding 
way to get to these, change them, and add to them (for example, languages like Lisp 
and Smalltalk are built this way, with “turtles all the way down’). 


There is also the concept of “gracefulness”. By the early 80s or so, Smalltalk was pretty 
graceful at all levels of definition. So | was very happy to see about 10 years later — in 
“The Art of the Metaobject Protocol” (Gregor Kiczales, Jim des Rivieres, and Daniel G. 
Bobrow) — an even more graceful extension mechanism that went one level deeper 
to allow really important really deep extensions to be made by using an even more 
carefully designed initial model. 


Good advice from Picasso: “Learn the Rules like an Expert so you can break them like 
an Artist”. 


Meaning: “Just breaking things doesn’t usually produce Art’. 


é Alan Kay 
<< Still trying to learn how to think better - 3y 


If the two digits of a number are multiplied together, the result is one third 
of the number itself. What is the number? 


15 = 1x5x3 


€é: Alan Kay 
<1 Still trying to learn how to think better - 3y 
Is it mostly a myth that computing advancements happened in a garage? 


Well, HP started in a home garage, but their computer advances didn't. 


Apple famously started in a home garage — here we have to stretch what I'd call a 
“computer advancement” to cover this. But here is what Woz said about this: 


“The garage is a bit of a myth,” Wozniak told Businessweek %. “We did no designs 
there, no breadboarding, no prototyping, no planning of products. We did no 
manufacturing there.” 


Woz did not invent or make the CPU or the memory or circuit boards etc. The nature 
of the code that was executed, and what it executed in, was devised by others. What 
he did do with some of the existing component technologies is to make a relatively 
simple computer that was still versatile enough to be interesting to hobbyists at the 
time. 


The Apple II was a nicer design and Woz did a great and insightful job, especially with 
regard to the integration of the display, processor, and memory. | would call what he 
got from the components that were used: “brilliant” and “useful”. So an advance of 
some kind without question. 


think of “computing advancements” as things like “making the first programmable 
computers from primary materials”, or inventing computer graphics, or inventing and 
making high level programming languages, or inventing the Ethernet or Internet, etc. 


n going down this long list, | couldn't find any garages (there might still be one or 
two), and | found mostly people with advanced degrees of one kind or another (and, 
yes, there are a few exceptions here also). 


n the ~ 60 years that I've personally been in computing (starting in 1961), I'm not 
aware of anything significant from a “garage” (but, again, | don't claim to be 
comprehensive here). 


€ Alan Kay 
\- 1 Have designed a few programming languages : 3y 


Why did 4th Generation programming languages like SuperTalk not catch 
on? 

| would not call SuperTalk a "4th Generation” Language, but a “different generation” 
language. 


The system that SuperTalk operated within was called “SuperCard”. It was essentially 
derived directly from Hypercard, and tried to dominate that category with many 
additional and superior features. | thought it was pretty good, and that Apple should 
take up the challenge (unfortunately Apple couldn't even value Hypercard). Other 
systems very influenced by Hypercard’s scripting language Hypertalk were e.g. Flash. 


During the first 5 years of Hypercard, Apple was able to count about 4 million end- 
users — of all kinds — who had made at least one scripted application. It's worth 
pondering this statistic. 


Neither system was able to deal well with the latter of the double edged sword: 
“Simple things should be simple; complex things should be possible”. 


As an observer, | thought this was a shame, because Smalltalk overlapped very well 
with the Hypercard scheme, where HC really solved a number of important UI and 
design issues for end-users, and Smalltalk’s semantics provided the “from simple to 
complex" spectrum needed. 


The sensibilities that were required for what was great about Hypercard’s (and 
Supercard's) design, had a very hard time also being able to live in the world that 
Smalltalk occupied. 


And then there was the opportunity completely wasted — squandered — by Apple of 
not seeing that a Hypercard-like system was the ideal model for authoring on the 
newly created web in the early 90s. (The web people couldn't see it either, and haven't 
been able to since ...) 


In any case, my experience over the last almost 60 years now, convinces me that the 
general audience for “programming languages” does not choose them because they 
are "good" (like pop culture and the consumer business, it is not generally a “value 
market"). 


€A Alan Kay 
\- Still trying to learn how to think better - 3y 


What benefits do you get from thinking of something (tech, etc.) as a 
medium rather than a tool? 


The way | thought about this in the 60s was that one way to define a tool is as 
something that helps control some of the degrees of freedom in a medium while 
allowing concentration on selected others. 


For example, “paper” can be thought of as a medium (lots of degrees of freedom), 
whereas pencils, pens, brushes, etc. can be thought of as tools. At a higher level, 
choices of language, style, rhetoric are tools that help writing. 


Computers are “meta-media” in that their content is descriptions that are universal 
enough to define both new media and tools for them. The most interesting ones are 
those media that can only exist because of being described on a computer. 


The commercial/consumer use of computers hardly touches this because the 
customers generally want familiar old media in a more convenient form (so that 
almost no new learning needs to happen in order to use it). 


This is “the small point” about computing, but misses the larger more important 
points. 


It's “a shame" that most of the “profession” of computing has also missed being able 
to see just what it is they are dealing with. 


€ Alan Kay 
<1 Still trying to learn how to think better - 3y 


What is the significance of late binding? 


“Late-binding” is an idea about having desired functionality while retaining as much 
flexibility and safety as possible. 


A big one to consider is that “software” itself is a late-binding of how a computer will 
behave. (Seems to be a pretty good idea, I'd say ...) 


At the next level, the idea of index registers in hardware late-binds addresses in a way 
that is not destructive of code. Memory management units (MMUs) allow global 
relocations with encapsulated local addressing. Indirections of various kinds allow 
easier changes at run-time. 


Consider a variable ... 

Consider a subroutine compared to in-line code ... 

An interpreter late-binds semantics of a computer. 

Consider microcode as a way to later bind the hardware via an interpreter 


An encapsulated object late-binds the “hows” (methods) and allows usage just via 
“whats” (meanings), and this allows various kinds of alternatives and transparencies to 
be more easily accomplished. 


The “hows” themselves can also be late-bound e.g in Lisp or Smalltalk this allows 
programs to analyze how other parts of the system are set up: to “reflect” on what and 
how things are done. 


("Reflection” done well, and allow more pragmatic possibilities, even at a very low 
level, to be carried out safely and efficiently. A good book — mentioned in a comment 
below — is “The Art of the Meta-Object Protocol" &% by Kiscales, Bobrow, Rivera ...) 


From a strategic standpoint, retaining as much late-binding as possible is a very good 
idea. It is much easier to dynamically remove some of the degrees of freedom if 
absolutely needed, but really difficult to go the other way round from an early-bound 
system to more flexibility. 


For example, quite a bit of optimization is violating module boundaries and hoping to 
get away with it. One way to do this in a more principled manner is to really have late 
bound modules as a model, and then implement the violations as “pragmatic 
features” of the programming system. 


For example, a method in an OOP system could have a “left-hand side” that is just 
semantics, and an optional "right hand side" of cases with optimizations. The method 
should run perfectly with the right hand side turned off, but will run faster etc with it 
turned on. 


Similarly, interactions between modules can also be handled in this dual way. (For 
example, Smalltalk is a “Message passing system” but did not manifest actual 
messages unless absolutely required. Also, Smalltalk had simulations of all of its 
semantics and could retreat to them if the lower levels on some machines were not 
complete or had bugs. 


It's worth noting that deep enemies of graceful and useful late-bindings are pernicious 
— and especially gratuitous — dependencies. These can happen in an early bound 
system, but there are more possibilities in a late-bound one (so much more design is 
required to really take advantage of the idea — this will pay off in many ways ...). 


Alan Kay 
1 Still trying to learn how to think better - Updated 3y 


What needs to be done in order to improve Anki to reach the promise of 
the Dynabook’'s "teacher for every learner"? 


There's definitely a way to think of learning as ultimately being able to remember — 
and every culture has found a lot of things that need to be remembered, are able to 
get children to eventually remember them, and have some of their behaviors be in 
accordance with their memories. 


But if we look at history, we find large changes in context of both what kinds of things 
to learn, and what it means to learn them. For example, the invention of writing 
brought not just a huge extension of oral knowledge, but an even more critical 
change of context: getting literate is a qualitative change, not just a quantitative one. 
A large goal of “learning to read and write” is to cross that qualitative threshold. 


A change so large that it is hard to think of as an extension of the prevailing thinking 
patterns in the era of its birth, was the invention of “modern science” less than 500 
years ago. It started with the return of accurate map making of all kinds and was 
catalyzed by the gradual realization that much of “the world was not as it seems” and 
by being able to make generalizations that could generate some of the maps. One of 
the most important larger perspectives on this has its 400th anniversary this year: 
Francis Bacon's "A new organization for knowledge” (Novum Organum Scientia), in 
which he points out that we humans have “bad brain/minds” stemming from a 
number of sources, including our genetics, cultures, languages, and poor teaching. He 
proposed a "new science” that would be a set of approaches, methods and tools, that 
would act as heuristics to try to get around the biases and self generated noise from 
our “bad brains”. 


His proposed “new science” is what today we call “science”. Before this, “science” 
meant “a gathering of knowledge” and “to gather it”. After this, it meant to move from 
knowledge to context and method and tools — and to new behaviors. This has led to 
not just a lot of new knowledge, but very different knowledge: qualitatively different 
knowledge in qualitatively different contexts. 


A trap here is that the use of ordinary language for discussing these three contexts — 
oral, literate, scientific — is that things can be said and heard whether or not the 
discussants also have these contexts (this was one of Bacon's four main “bad brain” 
traits). 


E.g. people who can read but have not taken on the scientific world-view can think 
they understand what science is, and can learn and memorize many sentences “about” 
science, without actually touching what they actually mean. 


Just as interesting, is the difficulty — for those who have gotten literate — of touching 
what is really going on — especially the feelings — in oral traditional societies. Music 
and poetry are bridges, but important parts of the innocence and id-ness are hard to 
get to. "Ecstatic music” can sometimes dominate one’s literate thought — especially 
when performing it. 


To make an analogy here: in our society, there are courses in “music appreciation” that 


mostly use “sentences” about “sounds”, “relationships, “composers”, etc., in which 
most testing can be (and is) done via checking “the memory” of these “sentences”. 


By contrast in “real deal music”, real music teachers treat their students as “growing 
musicians” and play with them as a large part of the guidance to help them “get 
larger”, to "make Technique be the servant of Art, not the master”, etc. It’s primarily an 
emotive art form ... 


A nice quote — which has many web pages — is: 
“Talking about Music is like Dancing about Architecture” 


(attributed to many people from Stravinsky to Frank Zappa). If you *do* music, you 
can barely talk about it just a little. The further away from inhabiting music, the less 
the words can map. (And: note that the quote brilliantly achieves a meta way to doa 
bit of what it says is difficult ...) 


The Dynabook idea — “a personal computer for children of all ages” — was primarily 
about aiding “growth in contexts"* and my initial ideas about it were partly about 
asking questions such as: 


“If we make an analogy to writing/reading/printing-press, what are the qualitatively 
new kinds of thinking that a personal computer could help to grow?” 


| got started along these lines via Seymour Papert's ideas regarding children, 
mathematics and computing (my mind was blown forever). | added in ideas from 
McLuhan, Bruner, Montessori, etc., and ... Bacon ... to start thinking about how a 
personal computer for children could help them take on the large world-view of 
science as “real science learning” (not "science appreciation). 


(Via Papert), the dynamic math part of quite a bit of science can be nicely handled by 
inventing special programming languages for children. But science is not math — 


math is a way to map ideas about phenomena — so an additional and important part 
of learning science requires actually touching the world around us in ways that are 
more elemental than “sentences” — even the “consistent sentences” of maths. 


In an ideal world, this would be aided by adults and older children. In the world we 
live in, most children never get this kind of help from older children, parents, or 
teachers (this is crazy, but humanity is basically “crazy"). 


Another way to look at this is that — as far as science goes — it almost doesn’t matter 
what part of the world you are born into and grow up in: the chances of getting to 
touch the real thing are low everywhere. 


Several of Montessori’s many deep ideas were key for me. 


One is that children learn their world-view not in class but by living in that world. She 
said the problem was that the calendar said 20th century but their homes were 10th 

century. So she decided to have her school *be* the 20th century, to embody it in all 
the ways she could think of in the environment itself. 


Another deep idea is that what is actually important is for children to do their learning 
by actively thinking and doing — and with verve and deep interests. She cared much 
more about children concentrating like crazy on something that interested them than 
about what that thing was. She invented “toys” that were “interesting” and let the 
children choose those that appealed to them (she wanted them to learn what deep 
concentration without interruptions was like, and that teachers were there to help and 
not hinder). 


In other words, she wanted to help as many children as possible become much more 
autodidactic. 


(Note that this has much in common with getting to be a deep reader or musician — 
it doesn’t much matter in the beginning what the titles are, what matters is learning 
how to stay with something difficult because you want to learn it — if the 
environment has been well seeded, then all will work out well. More directed choices 
can and will be done later. And note this is even the case with learning to speak!) 


After doing many systems and interfaces over quite a few years (~25) we finally got a 
system that was like the Montessori toys part of her school (Etoys), and then, in a 
Montessori/Bruner type of school (the Open Magnet School in LA), we got to see what 
could be done with children, the right kinds of teachers, and a great environment to 
play in and with. 


What never got done, was to handle the needs of children who don't have the needed 
kind of peers, teachers or parents around to help them. This help is not just to answer 
questions but to provide a kind of “community of motivation” and “culture” that is 
what human beings need to be human. (The by-chance forms of this tend to be very 
much reverted to oral society practices because of our genetics — and much of this 
will be anti-modern, and even anti-civilization. This is a very difficult set of designs to 
pull off, especially ca. where we are now.) 


To answer your question: the spirit of Anki is not close to what the Dynabook was all 
about. It could possibly be a technical aid for some kinds of patterning, but it seems to 
miss what “contexts” are all about. 


Here's another way to think of some of this stuff, and in a “crazier” fashion. 


There have been a number of excellent books over the years about the idea that the 
“invention of prose via writing killed off ‘the gods’ ". These are worth finding and 
pondering.* 


The two main problems are (a) we need “the gods"; and (b) “the gods" can be very 
good or bad for us (“they” don't care). 


It's worth pondering that from the perspective of science, a metaphor is a lie, but from 
the perspective of “the gods", a metaphor is true. 


The dilemma of our species — and ourselves — is that we have both of these 
processes in our brain/minds, we need them both, and we need to learn how to allow 
both to work**. 


Learning something really deeply and fluently goes way beyond (and *before*) 
conscious thought — important parts of the learning are taken to where “the gods” 
still lurk. 


And, just as you don't make up reasons for breathing (which “the gods" also handle 
for you), the reasons for doing these deep things move from “reasoning” to 
“seasoning” — for life itself. 


“Artists are people who can't not do their Art’. 


It doesn't have to do with talent or opinion ... This is a critical perspective for thinking 
about we humans, and what one of the facets of “identity” could mean ... Consider the 
relationship between the quote above and children ... 


When you are fluent in music, much of the real-time action is being done “by ‘the 
gods’ ", whether playing, improvising, composing etc. You are not the same person 
you were when you were just getting started. Music can get pedantic and over- 
analyzed, but this can be banished by experiencing some of it that is so overwhelming 
that it can't really be analyzed in the midst of the experience (this is not just certain 
“classical” pieces, but some of “pop” music can really get there as well). This produces 
the “oceanic feeling” that Romain Rolland asked Freud about. 


ou 


“Goosebumps are a kind of ‘basic ground' for ‘humanity 


It's interesting and important that “the gods” can be found at the grounding of very 
new contexts such as modern science, and that the two can be made to go 
together.*** 


To use this weirder way to look at things: 


“Education has to lift us from our genetic prisons, while keeping ‘the gods’ alive and 
reachable”. 


* For example: Eric Havelock’s “Preface To Plato”, and especially Julian Jaynes’ “The 
Origin of Consciousness in the Breakdown of the Bicameral Mind” (my vote for the most 
thought provoking book that is perhaps a bit off). 


** See Daniel Kahneman's “Thinking: Fast and Slow”, and ponder his “System 1” 


*** See Hadamard's “The Psychology of Invention in the Mathematical Field’, and 
Koestler's ‘Act of Creation”. 


€ Alan Kay 


Have designed a few programming languages - 3y 


How do! lay out my software architecture before | code if the requirements 
change every day? 


Here’s an exemplary process from the past that is a good start on how to think about 
the present, and even more importantly about the future. 


A change in federal laws in the mid-80s rendered much of public utility Brooklyn 
Union Gas's database and reporting obsolete. This included more than 1 million 
customers at that time. 


The CEO said “This must never happen again”. The CIO (as was often the case back 
then) didn’t know much about computing, but had great trust in his head systems 
designer and programmer Tom Morgan (who did know a lot, and also had a degree in 
pure math). 


Morgan was friends with John Davis of Andersen Consulting, who with his group had 
been experimenting deeply for several years with object-oriented design (Smalltalk on 
workstations) — and this because of the farsighted view of Mel Berstein, Andersen 
Consulting's head of Technical Services (I was consulting for Mel at this time, so | got 
to see what transpired first hand). 


Morgan and Davis decided to go all out. First, to solve the regulatory problem, but 
then to include a recasting of most of BUG’s information processing. The requirements 
were fluid. The target had to be the two IBM 3090 mainframes that were in-house and 
could not be replaced with anything more suitable. 


The larger story can be found in Harvard Business School case studies and in several 
IEEE papers. Here I'll just look at the part that helps with the Quora question. 


Designing and developing on the IBM mainframe would be a terrible route (the 
languages available were COBOL, PL/1, and assembly). 


So they decided they would first quickly build the entire system as a working 
prototype in Smalltalk to make sure they had the requirements and possible 
architectures running and vettable (non-debuggable “requirements” are not 
*requirements”). In a large room, they set up all the workstations, and used a version 
of the Smalltalk system tracer to graphically render the system and print it out on 
large tiled charts that could be put on the walls to show a large picture of the entire 
system. 


Every night (late!) the system would print out its organization (using an enhanced kind 
of E-R structure), and the tiled printouts were pasted up on all the walls of the room. 
In the early morning, the team would “do rounds” (as in a hospital) to see how "the 
patient” was doing. Questions that needed more detailed answers could be 
immediately answered by using the live Smalltalk on the workstations. 


They would take notes, then have different kinds of meetings to decide what needed 


to be changed, what needed to be added, etc. The extreme late-binding and “eternal 
lifetime” of Smalltalk allowed changes to take place in about 1/2 second (you don't 
ever have to do a systems rebuild, etc.), and they would spend the day fixing and 
debugging. This allowed a very neat and sophisticated design to be rapidly evolved, 
programmed, debugged and vetted. 


Meanwhile, the underlying semantics of Smalltalk were programmed for the 3090s (in 
PL/1 !) and debugged. This was relatively easy because Smalltalk had a working model 
of its own semantics (called the “interpreter simulator” that served as a runnable 
model). The compromises taken here were interesting, worked well, and are out of the 
scope of this answer. 


And there was work to make DB/2 be the backing store for the new object-base (1 will 
admit to helping with the design of the intermediate object-cache, which was 
modeled on the Smalltalk-76 OOZE virtual memory swapping system at PARC of 10+ 
years earlier). 


The architecture that was finally arrived at was essentially a real-time simulation of the 
company itself (and they could run the simulation ahead in time for forecasting, in the 
manner of a super-spreadsheet but with graphics). 


Why real-time? Because they decided to add radio tracking of their 1500 service 
vehicles to coordinate needs, especially gas leaks ("from the smell of gas to a fix in less 
than 15 minutes” was the motto they decided on). The control room for this system 
showed large screens for Brooklyn, the locations of the fix-it trucks, and many other 
kinds of info (it was like a NASA or NORAD command center). 


The whole process from start to finish took about 18 months, winding up with 6 
people to maintain the system, after about 130 were involved in the project at its 
peak. 


The key idea here is that CAD<->SIM done well will allow FAB to be merely a lot of 
work (but close to the minimum, straightforward, and without a lot of additional 
bugs). 


We could imagine a future system much better than Smalltalk at doing the 
CAD<->SIM part (for example, qualitatively better semantics for capturing and 
running requirements would be good, and much more comprehensive real-time 
displays and visualizations of the whole system on the fly, etc.). We could imagine a 
more automated route for taking the semantics from CAD<->SIM to FAB. And so 
forth. 


But | think, more than 30 years later, the BUG/AAC system design and process to 
create it stands up very well compared to anything I've seen more recently in large 
companies. 


€ Alan Kay 


I'm the "Alan Kay" in question (try Google for the usual misinformation) - 3y 


If a robot can achieve independent evolving thinking like a human being, 
shall we treat them like a human being instead of a machine? 


An interesting dimension and question is “rights and standing”. 


Robert Heinlein took up this question in the larger sense in his 1947 short story “Jerry 
Was A Man" (available online, perhaps not quite legally, at Jerry Was A Man ©). 


Worth reading — as are so many of Heinlein's writings (especially the earlier ones). 


Alan Kay 
Still trying to learn how to think better - 3y 


Is software engineering culture based on traditional engineering culture? 


My perception is that there are so many flavors of "Software Engineering” today — 
including many that claim “SE is what we do by definition” — that | can't answer this 
question in general. And there is certainly a wide range of approaches that is 
apparent, from horrendous to "more principled”. 


One thing | look for is similarities to some of the strongest approaches in well 
established engineering disciplines — civil, chemical, aeronautical, electrical, 
mechanical, bio, nano, etc — and these all include superior CAD tools combined with 
high definition SIMulation of the CAD models (using supercomputers as needed). 
Many of them can then take the debugged models directly into semi- or fully- 
automated FABrication which involves optimization processes, etc.. 


If we compare with the practices in software, it is hard to find similar levels of 
CAD<->SIM — remember CAD is about capturing the meaning and intent of the 
design in a detailed understandable way, and the SIM is to vet the meanings of the 
CAD. This is primarily semantic. The FAB part is to do all the optimizations of the 
meanings to allow practical deployable artifacts. 


I've been trying to find at least one great example of this as of 2020. 


What | have been able to find are a number of interesting FAB techniques that are 
better than they used to be — for example, LLVM and other bottom-level targets and 
optimizers — but most programmers seem to still program primarily in terms of 
intertwining meanings and optimizations in the same code (this is not a good 
approach), and the general design tools seem much too low-level). 


And “SIM of meanings" doesn't seem to exist as an operative idea today (there were 
some examples in the past, so I'm guessing there are some today ... somewhere ...) 


Part of what | think has happened is that the movement from raw “hows” in the 50s 
upwards in level of language (towards more “whats”, and capture of meaning and 
intent) seems to have ground to a halt somewhere in the 80s (maybe into the early 
90s?). This really makes it difficult to do a great CAD system (the language forms need 
to be about meanings and intentions and constraints). 


The simplest answer | can come up with here is that most software efforts are not 
done by people who have studied real engineering — so | think the larger answer is 
“generally, no". And most of the recent “software engineering” in large companies I've 
seen would also indicate "no" (but here | don't feel that I've looked at a 
comprehensive enough survey). 


But, looking at the work of some of the people who came up with both the term 
“software engineering” and the aspirations for it in the 60s — for example, Randell & 
Russell, Margaret Hamilton, etc. — | can definitely see real — and successful — efforts 
back then. And that probably means there is a tiny percentage of software makers 
today who are really doing “software engineering” where “engineering” can be used in 
a modern sense. 


So I'm holding out for at least one exception to this gloomy picture, and I'm dying to 
be shown it. 


€ Alan Kay 
“J Still trying to learn how to think better - 3y 


What part of Burroughs Systems did you work on? Did you enjoy it? What 
did you enjoy most, and why? 

| wrote quite a few programs for the Burroughs 220 — a vacuum tube core memory 
computer of the late 50s — while in the US Air Force in the early 60s. This machine 
was a bit like the MIX machine that Don Knuth used for his “Art Of Programming” 
books (Don wrote a lot of code for the Burroughs 205 that was earlier but with a very 
similar architecture). 


While in the Air Force, | learned the B5000 in advance of its arrival, but it was a bit late 
and | went back to college before it arrived. This was a very different architecture from 
any other computer — aimed at directly supporting higher level languages, multiple 
processors, virtual memories, emulating “data”, and deep safety, etc. | will confess to 
only understanding some of this, with the rest being filled in a few years later in grad 
school. 


| had learned enough to want to “mine” Bob Barton, the main inventor of the B5000, 
who was a prof in grad school (but he was famously reticent about his past 
achievements). 


| did not work on any Burroughs HW after the Air Force (but did learn the 6500, etc 
without using it). 


However, both the details and the idea of the B5000 was used over and over by me 
and other ARPA (and other) researchers, mostly emulated, especially in microcode 
when this was possible. It was the basis for how languages were implemented at Parc 
on the many different microcoded machines we wound up making. 


In a talk | once claimed that the B5000 had “7 out of the 10 best ideas in software 
implemented in its HW". (Less hyperbolic than it sounds ...) 


€ Alan Kay 

‘1 Still trying to learn how to think better - 3y 

Did ARPA/PARC preemptively try to measure the impact of its research 
projects? 


I'm not sure | understand the question, but as far as | know, the answer is “no”. 
There were sometimes “thresholds” that were agreed to in advance by researchers. 


For example, at Parc, Butler Lampson urged at the founding of Parc, and we wound up 
agreeing, that we just had to engineer inventions so they could be used by a 
minimum of 100 users (meaning for HW, at least 100 personal computers, or running 
100 machines on an Ethernet, etc.). SW inventions had to be done enough, supported 
enough, documented enough, etc. for 100 users. ... 


There were agreed on thresholds within ARPA: e.g. for how many computers the 
ARPAnet should be able to support, what kinds of vocabulary etc the joint “task force” 
on connected speech recognition should support. And of course, really large 


thresholds for the Internet. 


And so forth. 


€ Alan Kay 
© Worked at Apple (company) - Updated 3y 


What is it like to be an Apple Fellow? 


This was a point in time between 1984 and 1996, so | can’t speak for earlier or later. 


The history of Fellow programs goes back to the President of MIT — Jerome Wiesner 
— wanting to do something for his “odd duck” genius faculty members who were a 
bit weird for regular funding. His initial case in point was mathematician Norbert 
Wiener, one of the inventors of Cybernetics, and could define the term “eccentric”. 


Wiesner decided to set up a program called “MIT Institute Professors” — not a 
promotable position, but would be an elected honor, etc. — and this included both 
salary for life, and a research stipend that was completely discretionary: enough to get 
most research projects started without requiring normal reviews. 


Tom Watson was visiting the large IBM Cambridge Labs and heard of this, and decided 
this was the perfect antidote for his button-down conservative essentially marketing 
driven company. He set up the IBM Fellows program (which was greatly expanded by 
the later Chief Scientist of IBM, the very able Lew Branscomb). This was the same 
scheme as the MIT Institute Professors, plus one interesting power: any IBM Fellow 
could walk into the president of IBM at any time without making an appointment if 
there was a felt urgency. There were many famous IBM Fellows including: John Backus, 
Benoit Mandelbrot, etc. 


Other companies followed suit, including Xerox, which had about 16 or so while | was 
there. 


Butler Lampson and | both wound up as Xerox Fellows. 


When | went to Apple — as a Fellow — there was already a Fellows program, but they 
asked me to lay out the charter to be similar to IBM and Xerox. 


So being an Apple Fellow was as described above. The small discretionary stipend for 
getting things going helped many things at Apple. Besides my own work, Bill Atkinson 
had the resources to prototype Hypercard as he wished, and this was one of the most 
significant end-user systems of the 80s. The basic idea is that you would have to 
compete with everyone else for the larger funding needed for next stages — but you 
could compete with a working prototype at least. Eventually Apple Marketing got so 
strong that it brushed aside anything it didn’t think it could sell easily, and this started 
along slide downwards for Apple. 


After this | took my research group to Disney where | was a Disney Fellow, and then 
eventually a half and half arrangement with HP (ditto) until Carly got fired and about 
1/3 of HP Labs was laid off. (But these are other stories ...) 


Bottom line is that Fellows programs are very good ideas if the companies can adhere 
to Wiesner's original design. The type of person that got chosen to be a Fellow in the 
old days was brilliant in important ways, very independent, already hard at work, 
usually underfunded, and not a good politician. The Fellow design helped all of these 
considerably. 


€ Alan Kay 
© Still trying to learn how to think better - 3y 


How might drawing input work on a Dynabook? 


Here's the original cartoon and cardboard model from 1968. 
The First Dynabook idea — Fall 1968 
Alan Kay 


You can see that one of the features is a stylus (modeled after the original RAND 
tablet stylus, and also the stylus and tablet we did for the Flex Machine). 


In the 1972 paper | wrote for the ACM National Convention, | also mentioned that the 
whole face of the Dynabook could be a touch-sensitive display (this has tradeoffs as 
an idea, as people have discovered with iPads — it is not really suitable for lots of text 
entry). 


Experience with the really good character recognizer software on the RAND tablet for 
the GRAIL system convinced us that even a perfect character recognizer (it was close) 
wouldn't cover text entry — so there needed to be both a stylus and a keyboard). 


It is not at all a surprise that this is still true, and | complained to Steve about it when 
the iPad first appeared (he didn’t draw, etc. and didn’t type much ...). Why it took them 
5 years is one of those pop culture and marketing mysteries ... 


It's worth re-mentioning that by 1968, there were two interactive systems with 
pointing devices and hyperlinking: Engelbart’s NLS and GRAIL. Both had deep virtues. 
GRAIL felt a lot better. And, besides the plasma panel flat screen model (with power 
hungry pixels) there were also the start of liquid crystal flat panels from George 
Heilmeier at RCA and then Peter Brodie at Westinghouse that required very little 
power: 


—— | 
ae 


George Heilmeier at RCA in the 70s with a larger LCD panel. 


And, besides the ARPAnet, ARPA was also in the process of including wireless 
networking. 


And, the Flex Machine was about the right size of computer for first waves of VLSI 
integrated circuits that could go on the back of a display. 


So you can see that | didn't have to do a lot of imagining to come up with the 
Dynabook idea ... once | saw what Papert was doing with “children and powerful 
ideas", it was easy. 


€ Alan Kay 
“Still trying to learn how to think better - 3y 


What are some software architecture mistakes where things were done at 
the wrong layer or level of abstraction? 

If you knew there was going to be an Internet with many different kinds of computers, 
“operating systems” etc., and you realize that software “wants to run everywhere", 
could you then imagine that traditional theories of “operating systems” and 
“programming languages” etc are quite at odds with a good future from the 
standpoints of both architecture and abstraction? 


If you can, then you can see that “almost everything” today is way off ... 


€ Alan Kay 
“7 lam the Alan Kay in question. - 3y 


Was the Plato IV system any influence on the Alto and PARC in general? 


| thought | had answered this a few years ago, but can't find it. 


In 1968, ARPA convened its first grad students conference at the U of Illinois, with two 
students selected from each ARPA project. John Warnock and | were the two from the 
Utah project. John talked about the first practical algorithm for continuous tone 3D 
graphics (which he had invented), and | talked about a desktop personal computer 


with OO everything and a multiple windows UI (non-overlapping) — the Flex Machine 
— that Ed Cheadle and | had invented. 


Self portrait of the Flex Machine ca 1968 on its nice HP calligraphic display 


While there we got a tour of the U of Illinois and saw one of the first working flat 
screen displays — using a neon “plasma”. 


This was being developed for the Plato IV system terminals at the Plato project there. 


These consumed a lot of power per pixel, and on the eventual 512x512 panels of a 
few years later, you couldn't light all the pixels at once (so you had to display light 
on dark, which is not the best in regular lighting). When Parc started up a few years 
later we got one of these displays (without the terminal) and hooked it to a DG 
Nova minicomputer to try things. 


However, the pixels were bistable so you didn’t have to refresh from a memory, and 
the actual main reason for all this was that Don Bitzer who headed the Plato project 
wanted to rear project from a slide carousel that would be on top of the terminal (so 
high res pictures could be incorporated in the Plato lessons). 


A Plato IV terminal sometime in the 70s (you can’t see the carousel on top). She 
can point with her finger because there was a cross grid of infrared lights and 
sensors to provide (very) low resolution pointing. 


So the terminals were as large as if they had CRTs, but the memory and the projection 
possibilities made the risk of making a new kind of display worth it. 


We liked the Plato people a lot. Like the ARPA community and Xerox Parc they just 
invented and built everything they wanted that they couldn't buy. 


But they and Parc were on completely divergent paths. Plato ran on a 1000 terminal 
time-sharing system, and the displays were slow. So what they went after was very 
different (some of it was quite good, and some of it inspired us to do better with 
the vastly more powerful/person Alto). 


Back to 1968. We were aware that flat screen displays were coming, but it was very 
exciting and inspiring to actually see a working one. This led to discussions about 
when the transistors in the Flex Machine could be put on the back of a flat screen 
display to make a tablet personal computer (the answer was in about 10 years we 
thought). 


| didn't think much more about this until a few months later (still in 1968) | visited 
Seymour Papert and saw his wonderful work with children. It was a life's path changer. 
On the plane back to Utah | drew this cartoon: 


Suddenly, this seemed super important, because kids should be able to have fun 
learning and computing wherever they were. They needed “Dynabooks"! 


This became a deep goal, and it was one of the motivations a few years later at Parc 
for the Xerox Alto (whose first name was “The Interim Dynabook’). 


(The Plato people thought we were literally crazy to try this ...) 


€ Alan Kay 


AT Have designed a few programming languages - 3y 
What is the best way to learn Lisp in 2020? 
Much of learning language today is “learning libraries”, but what's really important 


about Lisp is its approach to computation, representing computations, and meta- 
reflection. 


(Lisp is just the tiniest of universal kernels plus a “library” — so the learning of Lisp is 
learning its kernel in detail.) 


So learning Lisp by building it from the completely bottom-up is advised. | like the 
original McCarthy treatment of just starting out with the idea of a pair and a few 
primitive operations. 


Peter Norvig has an excellent web page on how to make the Lisp primitives 
themselves in just a few lines of code from a more standard language. This is really 
good if you are already a programmer. It has a very nice feel to it.(How to Write a 
(Lisp) Interpreter (in Python)) @ 


Peter also has a nice set of links to other references at the bottom of this web-page, 
and also a link to a more elaborate implementation. 


However, going the opposite direction is really good also. For example, if you 
implement an “FEXPR” version of Lisp (it doesn't eval its arguments before a call — the 
callee gets to decide) then you can get rid of all the “special forms” — like quote, 
lambda, cond, etc — which are now just vanilla parts of the language. You can also 


look at some of the lazy evaluation schemes (that are today part and parcel of many 
“functional languages’). 


The Abelson/Sussman classic “Structure and Interpretation of Computer Programs", 
starts from the few primitives and builds structures for most common types of 
programming. A really important idea here is for the student to try some others that 
are not mentioned (such as “pattern directed” and “publish and subscribe” and 
“friendly syntax", etc. extensions). 


| would most certainly urge my students to implement McCarthy's “fluents” ideas, 
which allow both states progressing in time and completely functional actions on 
them. 


Many people who take the trouble to work their way through all this will have an 
enormous whammy along a number of important dimensions. (One will be to realize 
that the “normal” approach to programming and computing is disastrously meager in 
its outlook.) 


é& Alan Kay 
‘7 Had something to do with "Object-Oriented Programming" - 3y 


Can we use object-oriented methodology for designing heterogeneous 
complex systems systems? 


Here is my answer to what is essentially the same question: Software systems are 
becoming more complex & highly heterogeneous therefor they require better 
modeling approaches &implementation using multiple programming languages & 
technologies.can we use OO methodology for designing such heterogeneous 
systems? 


€ Alan Kay 
<7 Have designed a few programming languages - 3y 


| read somewhere that object-oriented programming was coined by Alan 
Kay circa 1966 or 1967 while he was at grad school. Has anyone 
influenced/contributed Alan to coin such a term? 

Since the question has been asked, it’s worth reading the detailed history | was asked 
to write by the ACM in 1992, that became one of the sections of the 2nd History Of 
Programming Languages conference. The Early History Of Smalltalk @ 


In brief for here, | saw parts of the idea in various forms starting in the early 60s, and 
thought it useful, but stayed asleep until in 1966 | saw Ivan Sutherland's Sketchpad 
system (which completely changed the ways | looked at computing), and within a 
week saw and learned the first Simula, which was less grand than Sketchpad, but 
showed how ordinary programming could be changed to take advantage of 
instantiations of processes. 


This double whammy combination “rotated” me to see things from very different 
perspectives. 


A key part of the “rotation” was that 


(1) at that time multi-processing and time-sharing systems were using hardware 
modified to isolate separate processes in the form of “virtual versions of the hardware” 


7 


(2) ARPA was in the process of talking about doing the ARPAnet, that would allow 
many computers to intercommunicate 


(3) my two main concentrations in college had been pure math and molecular biology 


The form of the “rotation” was ridiculously simple. It was the simple realization that a 
computer could compute what any computer could compute, and thus you could 
represent anything computable at any scale using only intercommunicating 
computers (most would be virtual) as building blocks. 


This was completely impractical (which | think was one of the reasons | didn’t think of 

it earlier). The molecular biology and the ARPAnet really helped, because it was known 
in the mid-60s roughly that each cell in our body contained billions of informationally 
interacting components, and we had 10 to 100 trillion cells in each of us. That kind of 

scaling actually worked, and was far beyond what computing could do. 


I think that seeing Sketchpad shocked me into being able to use “pure math mode” as 
part of the thinking rather than just the “worry about efficiency” thinking | was used to 
doing when computing. If you allowed “infinitely fast and large” computing, then the 
idea made excellent sense: it was a universal building block for all scales, and what 
remained were the central problems of designing complex systems. 


The nature of the intercommunications would allow schemes that were like algebras in 


pure math to be devised so that terms — like “+” or “sort” or “display” could have 
both general and specific meanings. 


The huge potential got me to look at the “impractical” part, which looked much more 
doable than I'd thought (it still took about 5+ years and a great research group to do). 


LISP had already solved a number of the problems, and this proved to be a great set 
of ideas for context. 


In the 1960s, software composites that were more complex than arrays, were often 
called “objects”, and all the schemes | had seen involved structures that included 
attached procedures. A month or so after the “rotation” someone asked me what | 
was doing, and | foolishly said “object-oriented programming". 


The foolish part is that “object” is a very bad word for what | had in mind — it is too 
inert and feels too much like “data”. Simula called its instances “processes” and that is 
better. 


“Process-oriented programming" would have been much better, don’t you think? 


In any case, | did not at all have “Abstract Data Types” in mind as a worthwhile goal, 
even though they were obvious — and this is because "Data" as an idea does not 
scale at all well. 


You are much better off hiding how state is handled inside a “process”, only having 
processes, and treating processes as “servers” for each other. 


That is what | had in mind back then. 


€ Alan Kay 
Have designed a few programming languages - Updated 3y 


Software systems are becoming more complex & highly heterogeneous 
therefor they require better modeling approaches &implementation using 
multiple programming languages & technologies.can we use OO 
methodology for designing such heterogeneous systems? 


I'll rewrite this question: 


What we want to do with software is generally more complex than the 
combination of our tools and understanding can handle reasonably: what 
should we do? 


We need to think about the best forms for decent solutions, and the best ways to get 
to the best forms. 


A form | like for software in general at any scale — and especially as scales get larger 
(e.g. approaching the Internet in size) — is that of a system of safe modules that can 
intercommunicate to get synergies and which can be modified and added to without 
having to stop the system. 


(Note that this is also a description of the Internet itself.) 


| also like to have the interiors of the modules be in the same system form (and turtles 
all the way down). However, the safeness of the modules (meaning they can’t hurt 
other modules or be hurt by them) allows all manner of interior forms, including really 
bad ones. 


“Communication” between entities means there exists a shared context — no matter 
how small — in which both entities have similar enough theories.* It is easy to see that 
descriptions in a “small language” will scale better than hoping for large 
comprehensive vocabularies in a vast library. 


There is no reason not to have communications both within physical computers and 
to modules within other physical computers. 


Part of “safeness” means that communications are no more than questions/requests (it 
would be a disaster to allow imperative actions by one module on another)— a main 
point about modules is to be able to do things whilst minimizing what any module 
needs to understand about another. 


Communicating in terms of descriptions when needed will require the 
intercommunication scheme to be able to express meanings and relationships and 
intentions in a richer way than most existing computer languages. 


A generalization of “publish & subscribe” is needed — e.g. “gifts” and “wishes” — 
where the “advertising” for both is in the “little semantic language” that the 
communications-brokerage can find useful matchups. 


And this semantics has to extend if we want to (or need to) share a module or give it 
to a another module. There might be cases in which a module will have to be 
unpacked from its context to the context of the unpacker: this can only be done if 
communication can be reached on what this means. ** 


One of the bugs in current day systems that are called "OO" is that they are too often 
used as “Abstract Data Type” languages, and this form of programming doesn't scale 
at all well, nor does it handle the needed semantics well. And most of them lack a 
deep enough reflective system to be able to gracefully convert them in all the ways 
that are needed. And there are many other problems ... 


My inclination would be just to make a new language (in the same way that the 


Smalltalks were made at Xerox Parc) to handle today’s scales and needs. 


* Licklider wrote his first “Intergalactic Network” memo in 1963 when he couldn't 
understand why when programming in Jovial on a Q32 in Santa Monica he couldn't 
use a library function written in FORTRAN for a 7090 at MIT, given that the two 
machines were connected by a telephone link. (The “Intergalactic Network" is now 
called “The Internet”.) 


This led to the question of how you could find a library routine if you didn’t know its 
name. One of the first languages to solve part of this problem was Smalltalk. 


** Note that CORBA and many other half-hearted attempts declined to actually try to 
solve these problems (but they are indeed solvable). 


G& Alan Kay 
“1 Still trying to learn how to think better - 3y 


How important was William English to the evolution of the personal 
computer? 

Bill English was one of my oldest friends and colleagues, and the simple answer is 
“enormously important”. 


Doug Engelbart was a great visionary — cosmic in many ways — and Bill — as the 
codirector of the “Augmenting Human Intellect” project and the chief engineer — was 
the person who both made things happen, and also found and mentored the highly 
talented NLS researchers, especially including Jeff Rulifson who was the main software 
architect and designer. 


| like the ACM's “Software Systems Award” better than the Turing Award because it 
can be given to as many people who have been central to a particular software 
invention and creation. The 1990 ACM SSA was given to Doug, Bill, and Jeff for NLS. 


The famed “Mother Of All Demos” (see The Mother of All Demos - Wikipedia @) was 
designed and created by both Bill and Jeff. 


Bill had theatrical experience, understood the magnitude needed for the "show", 
designed it, made it happen. specified most of the engineering underneath it, and was 
both the rehearsal director and the live director. 


Jeff was the main software arbiter and one of the central designers and implementers 
for what was attempted and shown. The scope of this was quite amazing (and beyond 
a Quora sized answer). 


Bill was also one of the best human beings I've ever had the pleasure to be around. If 
he saw that he could help, he would help. 


He helped me in many ways at Parc, especially in the early days when, quite fresh from 
grad school, | was trying to understand how to set up a research group. 


We loved him then, and we love him today. 


€ Alan Kay 
\- | Have designed a few programming languages - 3y 


Can you list and explain five (5) ways of interacting with the computer? 

This question sounds like a homework assignment. If so, you could learn quite a bit by 
trying to answer it by poking around yourself. And, ditto, if it isn’t a homework 
assignment. 


€ Alan Kay 


Still trying to learn how to think better - 3y 


Why do good ideas often not scale? 


This was a favorite utterance of the great systems (both hardware and software) 
designer Bob Barton. His exact quote was: 


"Good ideas don't often scale” 
It was both a cautionary heuristic and a practical observation. 


For example, anyone can make a doghouse from almost anything, but trying to make 
one that is 100 times larger will generally fail: the strength of many materials goes as 
the cross-section — so the scaled up doghouse will be 10,000 times stronger — but 
the mass goes by the volume, so it will weigh 1,000,000 times more. Scalewise, it has 
gotten 100 times weaker and will collapse (and so will a scaled up dog, for the same 
reasons). 


In simple programming, we can look at e.g. the possibilities for pair-wise pernicious 
interactions between components (could be just lines of code). There are N ways to 
pick one of these, and N-1 ways to pick the second one — this means that scaling 


simple programming is likely to give rise to "bad things happening”. 


This simple observation is one of many good arguments for devising module schemes 
that are completely encapsulated. 


If you look at the design of the B5000 hardware and software (1962), the amount and 
degree of encapsulation at every level is breathtaking — not just by the standards 
back then, but today. And, the B5000 was the first by many years to have its operating 
system completely written in a higher level language (a form of Algol extended for 
systems programming). This was another way to reduce pernicious interactions. The 
results were astonishing wrt crashproof integrity. 


This did not prevent other vendors like IBM from scaling bad ideas (IBM was a 
marketing company and, if they could sell something successfully to their 
unsophisticated customers, that was what they measured themselves by). 


Another of the many ways to ponder this principle is to notice that human capacity for 
simultaneous ideas is very limited (for simple things: 7+2, and for most things, more 
like 4+3). A situation that can be understood inside this limit can quickly get out of 
hand when it is exceeded. 


Yet another way is to look at other human limitations that are called “cognitive biases" 
and find the ones that are prone to disaster when what is being thought about is 
scaled. 


For example, we have a very hard time to deal with growth that is not linear with our 
“commonsense” alone. If we know that, we can use tools — like arithmetic — to 
compensate by calculation. But note how easy it is for most people to be taken by 
surprise by an easily calculable exponential growth curve, such as COVID-19. 


Another one is “confirmation bias”. It can take an idea that works in limited form and 
transform it into “the greatest idea of all time”. We see this in many areas of our 
culture (and unfortunately in our governments). 


Back to this as a mantra: it is one of the most helpful, and worth using. | would call 


won 


“good ideas don't often scale”, "a great observation that does scale”! 


Alan Kay 

\ 7 1am the Alan Kay in question. - 3y 

In a talk Alan Kay gave, he said there are several reasons (6 or 7 reasons) 
why America no longer hires and makes a good use of its smart people and 


scientists like it used to in WW II and in The Cold War, can he mention 
them? 


| don't think things are as total as the question implies. But — with regard to funding 
for computing as an example — | don't think there is any question that the 
government funding for “edge” computer science is very far from the same quality 
and kind as it was in the 50s and especially the 60s. 


The fruits of the funding that was done back then gave rise to a number of very large 
world wide industries, new technologies, personal computing of all kinds, pervasive 
networking of all kinds, and created many 10s of Trillions of dollars of new wealth. 


The return on investment was enormous, and one would think that the government 
would want to foster more “edge” research of this kind for both the uplifting of 
humanity and the new kinds of commerce it gives rise to. 


| don’t know in a scientific way why this isn't being done today, but we can make 
some guesses that won't be too far off. 


First, it's worth noting that “people in general” know little about science and technical 
fields, they are not interested in science, and in fact they distrust technical people to a 
considerable degree. This is stuff that is not normal to their culture, and what is not 
normal to a culture seems “crazy” and “dangerous”. 


So it is usually only when they are deathly afraid of a perceived danger that they are 
willing to call on technical people for help, and especially to fund them. War and the 
threat of war have been the biggest stimulants to “unusual” research, methods and 
people. 


(An interesting example today is the difficulty “people in general” have in getting 
enough afraid of COVID to even protect themselves and their own families, let alone 
listen to people who are specialists in epidemics explain what is going on and the 
penalties for not acting. Similarly, and even more disastrously, they can’t get enough 
afraid of the climate problem to politically back what needs to be done.) 


For the rest of this question, I'll point to a white paper | wrote last year for a 
foundation in the UK about how maximum efforts have come about, and how they 
have been organized. There is also a part of this that talks about why there is such 
resistance. Some of the reasons discussed there include: 


¢ Myopia in time and environment 


¢ Amnesia (quickly forgetting past difficulties) 

© Optimism (“things will work out") 

Inertia (especially where there is uncertainty) 

¢ Simplification (cognitive load, etc.) 

e Herding (basing decisions on societal consensus) 
¢ Other (aversions to outside the tribe) 

e Loss Aversion 


¢ Dunning-Kruger 
White paper: https://internetat50.com/references/Kay_How.pdf 


All the references from that talk: Alan Kay - References 7% 


€ Alan Kay 
Have designed a few programming languages - Updated 3y 


What was the very first computer science fact you learned that blew your 
mind? 

My undergrad majors (in the early 60s) were pure math and molecular biology, and | 
worked my way through college as a “data systems” programmer for the National 
Center for Atmospheric Research in Boulder, Colo (basically writing sorting and 
retrieval routines for the mountains of weather data that were used for analysis, and 
for one stretch, writing systems code for the CDC 6600, including several simulators of 
it and for it). 


| knew | didn't know anything about computing, but like most programmers | could 
still write and debug code to do this and that. 


On the other hand, from the standpoint of pure mathematics, the computer was 
pretty simple and straightforward — and finite — the tricky parts came from the huge 
degrees of freedom over time, and sheer size of assertions and implications vis a vis 
our weak human brains that had to handle these compared to the way regular maths 
were approached as small usually infinite generalizations. 


The notions of “artificial intelligence” also were unmysterious philosophically to those 
deep into where molecular biology had gotten to ca 1965 or so. Again — as with life 
itself — what was tricky was the amount and kinds of organization required to get 
simple building blocks to “think”. This seemed like an extremely difficult, but 
eventually solvable problem. 


Simulating one computer with another — a la Turing — also seemed straightforward, 
and | had to do this several times while at NCAR. 


After graduating, | was burnt out and decided to take a year off to learn a bit more 
about computing instead of plunging into grad studies in my majors (or to get a 
regular job). 


| quite accidentally wound up at the U of Utah grad school in computing in late 1966, 
and in my first meeting with the head of the department — Dave Evans — was 
handed a copy of Ivan Sutherland's MIT PhD thesis “Sketchpad, a man-machine 
communications system” from just a few years earlier in 1963. 


This was the first time my mind was completely blown by something in computing. 


This is because every part of it took something where | understood the category — 
such as a computer can make pictures, and a computer can simulate another 
computer, you can program a computer, etc. — and provided a completely startling 
example that | had not dreamed of. 


es, 
~ 
(wees — 
Prog arnming 
bp Autorrertic — 
te. Best Mt Se 
\ tevin 2 
wr eee oo 


“Ienne” 1 


——- wee wes 


an to ee 


Automatic 
Dyramic 
Srrvelations 


| don't know how to sort out the impact from what it was specifically (as the first great 
invention of interactive computing/declarative programming/object oriented 
design/meta bootstrapping of tools, etc) from what it was philosophically: a 
demonstration that “simulating a computer on another computer” could be 
qualitatively different and superior in kind. 


Both of these hit me/bit me hard, and for life. The first was “it's the kind of thing we 
must invent for everyone”. The second was “it’s a new artistic medium of vast scope 
that cosmically transcends its materials”. 


€ Alan Kay 
< I'm the “Alan Kay" in question (try Google for the usual misinformation) - 3y 


Has the fixed architecture of microprocessors turned out to be a wrong 
turning, given that many functions can be more easily performed in a 
programmable gate array which are rather more flexible? 

This question preaches to this particular choir, because of the great success at Xerox 
Parc in the 70s of the various microcoded personal computers done there. And before 
this, microcode (invented by Maurice Wilkes of EDSAC fame) had also been used in 
significant ways on significant computers. 


Part of the reason for the success of microcoding had to do with the large differentials 
in speed between CPU logic and most memory cycle times (this was often 5-6 times in 
favor of CPU logic, and meant that quite a bit of emulation could be done with no loss 
of systems speed). 


FPGAs overlap on some of the same territory, but with different tradeoffs — and they 
had a long development curve: it took quite a few years for the combination of 
enough gates combined with high yield optimized “soft wiring compilers” to get 
FPGAs to be practical materials. One of the tradeoffs is e.g. slower clock speeds but 
immensely more possible parallelisms and comingling “processing” with “memory” ... 


Bottom line for me today: it makes a lot of sense to have hybrid systems that combine 
both fixed and variable low level functions (for example: Intel has a rather large line of 
offerings which combine an ARM-like processor with a fair amount of FPGA real- 
estate). 


€ Alan Kay 


Have designed a few programming languages - 3y 
If threads were allowed to share the stack memory, what complications will 
this create? Please make a list and discuss it. 


This sounds like a homework assignment, and — if so — | don’t think you should try 
to get Quorans to do the work for you. 


é Alan Kay 
<1 I'm the “Alan Kay" in question (try Google for the usual misinformation) - 3y 
Are there any university math departments headed by a computer scientist? 


Good question (and | don't know offhand). But there are any number of computer 
scientists who are deep enough mathematicians to qualify (e.g. Don Knuth, etc). 


€ Alan Kay 
<1 I'm the “Alan Kay" in question (try Google for the usual misinformation) : Updated 2y 


Why is it hard to make software compatible for multiple operating systems? 


It actually isn't very hard if you are willing to put in some extra work and care (and 
these extras could be considered “hard”, but they are up front prep). 


A classic route was to make a virtual machine that is hardware independent by using 
an implementation route available on all machines — for many years, that would be a 
subset of C (where it is used as a “universal assembly code” target for the compilers 
you have made). 


Then you ask about the minimum you need from an OS (rather than the opposite) in 
order to work on a particular machine and OS. This is usually access to signals coming 


in — especially from the Ul — and to be able to hand off a bit-map for the OS to 
display on the screen. You will have written your own file system, but you need to map 
it to the one the OS actually uses. 


When Squeak Smalltalk was done ca 1996 it took volunteers a few days to a week to 
get the system running bit-identical to all the other Squeaks on other platforms. In 
practice, the “compatibility code amounted to between 1,000 to 2,000 lines of C. 


Squeak Smalltalk was written in itself and was packaged as a virtual memory for a 
computer. The lowest levels were written in a subset with a translator into C. 


Bootstrapping can be fun and easy if it is part of the overall plan for doing software. It 
is often quite difficult to try and bolt it on afterward. 


A key idea in first bootstrapping an interpreted VM is that once you get the system 
going on the new platform it can then supply tools for any needed extra 
optimizations. 


Now there is a second route, which is to use JavaScript as “an assembler” target of 
one’s own compilers. The architecture of the browser is “a bit random”, but the 
JavaScript implementation is highly optimized, and the storage allocator is good 
enough. If JavaScript were more reflective, then one could write much better 
debuggers. As it is, using it as a “machine”, one can put more felicitous languages on 
top that do have much better debuggers, etc. 


| think this makes a lot of sense today because — with all its flaws (and there are 
many) — the browser environment is the only automatic universal deployment 
environment available on the Internet right now, and there are more and more 
examples that show that the optimizations of Java, and the possibilities for 
WebAssembly when needed, make this a decent target for one’s own systems. 


G& Alan Kay 
1 Still trying to learn how to think better - Updated 3y 


What did Jon Von Neumann contribute to computers in contrast to Alan 
Turing? 
Alan Turing in the mid 30s explored computability*, which was triggered by interest in 


Goedel's demonstration that most interesting systems of logic were not completely 
decidable. 


Turing did this by inventing a thought experiment that could do the simple human 
actions of following a limited form of recipe and writing and reading marks ona 
“tape” (like a long strip of paper). 


The most important of the results showed that a very simple machine was capable of 
simulating *any* machine (this is called a “Universal Turing Machine”, and many very 
simple versions were subsequently devised). 


Von Neumann was one of the great mathematicians of his day. During WWII both he 
(in the US) and Turing (in the UK codebreaking project) separately got involved with 
mathematics and machines that could “do math”. 


It wasn't just von Neumann who realized that Turing’s thesis model actually could 
make a lot of sense as a physical computer. This because the equivalent of the “tape” 
(some form of “memory”) — can be a lot cheaper than the recipe following hardware 
— which could then be made as small as possible because more complex computers 
could be emulated. 


But von Neumann's influence in pushing this idea was very important in getting some 
of the first programmable computers to be developed after WWII. 


(There had already been several examples: in Germany by Zuse, and in the US by 
Atanasoff-Berry). 


And in the US progress was slow enough so that Maurice Wilkes from the UK who had 
attended the seminars in the US, decided to “just do one", and this resulted in the 
EDSAC at Cambridge ca. end of ‘48 (which is my vote for the first available 
programmable computer**) 


In practice, the “simple logic” part of these early computers was made much more 
complex than Turing’s theories demanded. This is because everything was really s | o 
w , especially given the memory technologies available. This got the designers to want 
to put more into the fixed part of the machine and do less emulation using the “tape”. 


This mismatch of speeds is still the case. The intermediate ground of making the fixed 
part of the machine be an emulator has been a very good solution (it made the work 
at Xerox Parc possible), but is oddly not found in most commercial CPUs. This has 
affected many things adversely, including software development, design of higher 
level programming languages, etc. 


* See the helpful comment below by Robert Langner, which includes a link to a short 


history of what Turing did in the 30s. 


** The Manchester “Baby” worked just before EDSAC, but | think it was too 
rudimentary to count — it was a test machine. (But see Clem Coles’ harummph 
below.) 


The Manchester Mark | worked for the first time a few months after EDSAC. (Not that 
priority means anything, but there is still the friendly rivalry between Cambridge and 
Manchester ...) 


€A Alan Kay 
‘<1 I'm the “Alan Kay" in question (try Google for the usual misinformation) - 3y 


What are old technology systems referred to? 


Normal (“Technology’” is all that stuff that happened *after* you were born ...) 


G& Alan Kay 

<7 Still trying to learn how to think better - 3y 

Why do people often attribute inventions with only a single genius? 
Joseph Campbell's book “The Hero With A Thousand Faces” provides part of an 
answer. In this view, humans are not only genetically wired for language and stories, 


but we might even have some story tropes partly built in. In any case, cultures can 
preserve tropes, genetics or not. 


One of them is “The Hero's Journey”. One of many reasons why this appeals is that it 
simplifies the problem of making up a story and telling it, by concentrating the action 
ona single hero. It is helped that there are “prime movers” in every society who have a 
bit more “oomph” to get things to happen. 


Journalists love “The Hero's Journey” trope, and so do most people. 


In telling stories about Xerox Parc, | have moved back and forth between the really 
amazing individuals who made difficult things happen regardless — Butler Lampson, 
Chuck Thacker, Dan Ingalls, etc. — to trying to show just how much of the ideas had 
important prior art in both the ARPA community and computing in general, and also 
the large process influences from WWII and Cold War projects that we learned. 


Still, the sequentiality of stories and the limitations of short term memory cater to 
simplified versions. Real history is a big tangled wall chart of influences. 


And ... though e.g. Newton somewhat required the 17th century and some of the past, 
what he did by himself is so amazingly staggering that you have to consider him a 
single genius. 


€ Alan Kay 


“1 Have designed a few programming languages - 3y 


What term describes an instance of a program being executed? 


In antediluvian times it was called a “process”. | think terms have gotten mixed 
together and less clear these days ... 


Alan Kay 
Still trying to learn how to think better - Updated 3y 


Which book has provided you the most tangible benefits in your life as a 
computer scientist? 


Most of the writings that got me going early on (starting early in the 60s) were papers. 
People who read a lot can never come up with “a single book” for anything. 


I'll just stick to the 60s. Books whose ideas and perspectives had a huge influence on 
me back then with regard to computing included: 


“The Molecular Biology Of The Gene” (1965) by James Watson (I was learning 
molecular biology, and this spilled over into computing after | started grad school in 
1966). It’s all about being able to make really complex things from gazillions of simple 
parts (both the complexity and the scale were far beyond computing, and this helped 
a lot, as did some of the solutions of Biology). 


“Understanding Media” (1964) by Marshall McLuhn. | didn’t understand it when it 
came out, but spent the summer of 1967 learning how to understand McLuhan (partly 
by starting with “Gutenberg Galaxy") and this was a biggie. What McLuhan was driving 
at is one of the most important perspectives on human beings. 


“Sketchpad — a man-machine computation system” (1963) by lvan Sutherland. His 
thesis — so it counts as a book — probably had the largest effect on me vis a vis 
computing when | first read it in 1966 (and to this day). It is the epitome of so many 
important ideas, perspectives, aspirations, etc. in computing. 


“Semantic Information Processing” (ca 1963) ed. by Minsky. This essentially was 
digests of the first wave of Al theses by Minsky’s students plus several important 


papers by McCarthy and Minsky. Some of them were revelatory (e.g. the Thomas 
Evans Geometric Analogy solver, and the first McCarthy "fluents” paper). 


“Computation: Finite and Infinite Machines” (1967) by Marvin Minsky. A truly lovely 
book —- another “kind of how you should be thinking about things”. Automata theory 
was already off the main track in computing, but this book was so beautifully done 
(and | also had a UG degree in pure math, so ... It also got me to look at: 


“Lisp 1.5 Manual” (1964) by John McCarthy et al. The front part of this is revelatory 
and struck me as “kind of how things should be done”. 


“A Programming Language” (1962) by Ken Iverson (I read it in 1967). APL had not yet 
been implemented, but this book was hugely influential on many computerists. A 
really important different perspective on programming. Put this together with Lisp and 
Simula and Sketchpad, and you can loosen up your thoughts about programming. 


“An Algol 60 Compiler” (1964) by Randell and Russell had the wonderful ideas — 
which were pulled off with a great implementation — of having models of programs 
and to have parallel semantically exact implementations for different purposes (e.g. an 
interpreter for programming and debugging, an optimizing compiler for production 
runs). This fit nicely with the advent of CAD and SIM (with Sketchpad and other early 
systems) along with “automatic FAB" to get a final product, that was the kind of thing 
that engineers concerned with computing could see happening to their fields 
(including the field of computing). Ironically, to this day, computing in general has 
done the least with CAD<->SIM->FAB (except in the remote reaches of computing 
research in ARPA and Parc, etc.). 


G& Alan Kay 
1 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 3y 


Are there necessary compromises between making a computer a consumer 
convenience along with being a tool for augmenting intellect? 

One way for you to answer your question is to consider the technologies of 
reading/writing and their extension into the mass world via printing. If you start with 
the larger world of “augmenting intellect” via reading and writing and printing, it is 
very easy to also use these tools for “consumer conveniences”. 


If you start with consumer conveniences, you will have a much harder time along 
many dimensions to get to the structurings needed for the larger ideas. 


| think this applies across the board for computing. A bunch of us put in a lot of effort 
in the 60s and 70s to make a version of “the large ideas” that could span the world 
and be a very new kind of “book” for everyone. 


Some people use this full out. Far too many never realize what they are missing (this is 
because computers can also simulate lots of stuff that is actually “anti-augmentation 
of intellect” (namely “diminution of intellect”) e.g. Twitter). 


Alan Kay 
Still trying to learn how to think better - 3y 


How did people at PARC think about Al? 


A short answer here, and I'll likely get back to this for a longer one. 


Parc had from the beginning some of the top people in Al, including Danny Bobrow, 
Terry Winograd (a consultant), Mark Stefik, John Seeley Brown, Richard Burton, Ira 
Goldstein, and many more. 


A number of impressive systems were done, including KRL, Steamer, etc. 


€ Alan Kay 


Have designed a few programming languages - 3y 


How was the first computer system programmed before the advent of 
programming? 

There are two ideas here (a) “program” and (b) “computer”. 

Programs have been around for a long time as directions to be read and carried out 


(early on just by humans). Heron of Alexandria (ca 30 AD) made a variety of 
mechanical devices that were controlled by “programs”, etc. 


A computer is a machine that does work on information. You can make a simple 
adding computer with two rulers: 


1 2 3 - 


ores | 
FA mn 


i 
— 6 7 


This shows the addition of 2 5/8ths + 15/16ths with the result: 3 9/16th 


To get a programmable computer we need to make a machine that can read a 
program of some kind, and follow its directions. 


By this definition, the Jacquard Loom is a wonderful hybrid computer: it works with 
information but its results are woven patterns in cloth. 


The Jacquard Loom was controlled by programs punched into a sequence of 
cardboard cards. 


The cards are strung together and read in sequence. The program controls which 
threads are lifted in the loom part to the left. 


Programming can also be done by “plug-boarding”, where the program is given in 
terms of connections of wires. For example, here is how a punched card accounting 
machine was programmed: 


Pe 


This was actually quite fun to do 


When most people say “computer” today, they mean a “stored program computer", 
where the program is represented in the very same memory structures as the “data”. 


In operation, both the data and the program are usually represented as a “word" of 
binary digits — each one “on” or “off” — perhaps 16 of these. If you look at all the 
patterns for 16 on-off things, you will get 65,536 of them. You could decide to 
interpret these as positive integers, or as +32,768, or as two or three text characters, 
etc. 


And you can also decide that the first 3 binary digits could represent 8 different kinds 
of operation for the machine to carry out, and the remaining 13 digits could be the 
“addresses” of 8192 of the binary digit words stored in the “memory”. 


Then a program would be a sequence of these operation words held in one place in 
memory, and part of the hardware of the computer would look at a word and do the 
operation indicated, and then go to the next word and do that operation, etc. 


The result of the operations would be to get data words from other parts of the 
memory, do the indicated operations, and then put the results back into memory. 


| went through this in some detail in order to answer your question. 


| think you can now see that “programming” is quite independent of having a machine 


to program. A program — in the terms we are using here — is “information” that 
“describes” a sequence of actions that can be “interpreted” by processes (could be in 
human brains or some form of machinery). 


In our example above, we can program our first computer by hand, by directly writing 
a sequence of 16 digit words with the on-off notations — a mark or not, or a hole or 
not — that correspond to the instructions we want to have carried out. 


These can be entered into our computer using switches on the panel, punched paper 
tape or cards, etc. 


A somewhat different answer | wrote a while ago might also help: 
How were computer programs and languages like COBOL installed into computers 


back in the 50s before software? 


€ Alan Kay 
‘Have designed a few programming languages - Updated 3y 


Is there a programming language that uses past and future tense? 


Some very confident (why?) answers here about “no, there aren't programming 
languages with past tense”. Or “there isn’t any reason for such a thing”. 


But take a look at Lucid, by Wadge and Ashcroft. Its variables are histories of the 
values, so the various kinds of past tense can be used. Lucid (programming language) 
- Wikipedia 


Also, there have been a number of experiments and proposals for “world-lines” in 
programming, especially “good old real Al” programming. These go all the way back 
to John McCarthy's “Situations, Actions, and Causal Laws” papers in the 60s. A 
“situation” is a past whole state of a system, and it is reached by a “past tense” 
operation. 


These ideas are important on a number of fronts (for example, think of the many ways 
that a “general UNDO" at any granularity could vastly help programming and 
debugging). Both databases (usually) and user interfaces (sometimes) have unlimited 
undoes and looks at the historical past, but this is also very useful for finer grained 
programming, Al, etc. 


Histories and tenses can be implemented in languages with a decent meta-framework 
(e.g. Lisp, Smalltalk, etc.). Here is a paper about such an experiment: 
http://www.vpri.org/pdf/tr2011001_final_worlds.pdf @ 


é Alan Kay 
7 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 3y 


What did you think of How Did We Find Out About Computers by Isaac 
Asimov? 

| like Asimov, but this book doesn't explain anything that would help a child learn how 
computers work — how to make them, program them, get them to do a few semi- 
intelligent things, etc. 


é Alan Kay 
* 1 Lam the Alan Kay in question. - Updated 3y 


What is the story behind Alan Kay's adage “Simple things should be simple, 
complex things should be possible”? 

I think | came up with this slogan at Parc during discussions wrt children, end-users, 
user-interfaces, and programming languages. Chuck Thacker (the genius behind the 
Parc hardware) also liked it and adopted it as a principle for many of his projects. 


So e.g. Smalltalk needed to work with children and end-users even more intuitively 
than (say) JOSS or Logo. But we also wanted to write the entire system in itself, so that 
those who were curious — especially later on — could “pop any hood" in the system 
and see a live program/object written in exactly the same terms as what the children 
were learning. 


Similarly, the GUI had to be easily learnable by children, but — looking ahead — it had 
to handle “50,000 kinds of things we hadn't thought of done by 50,000 programmers 
we hadn't met” and be as simple as possible. 


Another part of this was that we were determined to have a very easy to learn Ul 
would also incorporate end-user programming (scripting) as a natural part of it — in 
other words to combine what had to be simple yet possible with the programming 
language with what had to be simple yet possible with the UI. 


The general zeitgeist against this idea — both back then and now. Basically: those 
artifacts that do simple things usually wall off next levels of complexity, and those that 
do complex things don't do anything simply. 


But, given that there have been some really good examples of how to do both, it's 
hard not to see most computer people as (a) not caring, or (b) being lazy or unskilled, 


or (c) both. 


It's worth pondering this. One argument against mine, is that “people need and want 
‘appliances’ " that only have one function. 


| would say, "Well, that’s the simple part, now show me how the next idea and need 
the end-user has that is not directly in the surface appliance can be realized” (this is 
the enormous difference between a computer artifact and anything made from atoms 
— and it means that the analogy to appliances is not a very good one). There is just 
no reason not to allow and make possible safe "hood popping’ ... 


€ Alan Kay 
{Have designed a few programming languages : Updated 3y 


As a software engineer, if you were the professor of an "Introduction to 
Programming" course at a college, what programming language would you 
choose to introduce newbies into the world of programming? Why? 


For me, this question is out of the scope of a Quora answer. 


I'm a full time researcher, but also an adjunct professor of "CS" at UCLA (which means 
| don't have to go to faculty meetings, | can teach a class when | want to meet some 
students, and | can choose to work for free (which | choose to do)). | usually will do 
something in grad school or upper division (and usually with other professors). 


The problem with teaching a "101" course in most places in the US is that there are 
enormous demands for the course to be a prerequisite for the courses to follow. 
Nothing that | would choose to do would satisfy most (or any) of the constraints. 


But | think a good way to approach this is to choose “project types” and look at 
various ways to handle the project — first, if humans had to do the project, and then 
to use a "pseudo-code” write down the strategies and tactics. 


Some projects are like assembly lines, some are like simulations of systems, some are 
like “constitutions”, some are like “ethics”, some are like mathematics, some are like 
building a bridge, etc. Styles of programming and languages have evolved to try to 
supply better tools for each kind of problem area. 


| think having many perspectives at the beginning is critical because many people 
seem to be like Lorenz's ducklings and think the first programming language they see 
is “their momma" and then follow this around for the rest of their lives. This is a bad 
idea even if the first programming language is on the “relatively better” side of this 
shifty tribe. 


Many programming languages are rather like the underlying hardware, and this is a 
distraction from the big idea of computing: if you have a computer you can make any 
kind of computer that will better serve your needs. This is a meta-idea rather than a 
“slogging-idea”. It is the one least heeded by most programmers in industry. 


Since any computerist who should be teaching about programming — and especially 
“software-engineering” — should know enough to make a pseudo-code run 
(especially on top of some dynamic substrate e.g. Javascript), | would make a 
programming environment for the class in which the different kinds of strategies and 
tactics can be explored. 


One of the most fun classes | was involved in a few years ago at UCLA was with 
professors Tod Millstein and Alex Warth, in which we used Alex's “language generator" 
— called Ohm — to quickly make up programming languages for different styles 
(think SICP but with nicer looking things to work with, especially for beginners). The 
resulting languages were interpreters but ran plenty fast enough to explore a wide 
range of projects. 


The students for this class were not beginners, and the class was about inventing, 
designing, and making languages, but the tools involved could easily be put to the 
purpose of making a pedagogical language for beginners. 


If successful, the students would come out at the end with many different perspectives 
on programming, be able to do them to a reasonable extent, and (I would hope) also 
with a profound dislike of most of the existing programming languages in the world, 
especially the most used ones. Some of these students will go on to invent much 
better ways to program (as opposed to the mass of programmers who think coping 
with misfit tools is being “heroic” (instead of “yikes!"). 


In other words, | don’t think the primary purpose of college is for vocational training, 
especially in programming (there are trade schools for just this purpose). The main 
purpose of college is to provide enough new and different perspectives to raise one’s 
level of thinking both overall and with regard to the subjects taught. And this should 
especially be the purpose of college in the first few years. 


€ Alan Kay 
Still trying to learn how to think better - 3y 


Is the second best way to predict the future to fund it? 


Note that the *easiest* way to predict the future is to *prevent* it. Or do nothing. Then 
the future will be like the present and the past (as it was for hundreds of thousands of 
years after humans appeared on the planet). 


And that the phrase “the best way to predict the future is to invent it” is about 
willpower more than goodness of goal (the politicians in Washington are inventing the 
future every day, and it is a bad one). 


There is plenty of funding around, but what's required is the *kind* of funding that 
was typified by JCR Licklider at ARPA and Marvin Dennicoff at ONR. 


€ Alan Kay 
< — Still trying to learn how to think better - 3y 


I saw the first half of 2001: A Space Odyssey, but it was annoyingly slow 
paced, and | didn't see the need for that "Dawn of Man” sequence. Is it 
worth watching the rest of the movie? 

n Business, “the customer is always right” (this is why we buy/pay) — a large part of 
the process is to “find a desire and fill it”. 


n Education “the learner is usually wrong” (this is why we learn) — a large part of the 
process is to help the students to grow their minds to be larger than when they 
started. 


f a person going to a movie or theater or buying a book, etc., considers themselves a 
“customer”, they will be upset if what they are buying is not what they wanted. 


o 


f they consider themselves a “learner”, then they will be willing to go through a much 
longer process to see if learning from “an unusual exceptional mind” will be wonderful 
*after* they have changed enough to be able to deal with the ideas. 


Today is a tough time for *Real Education*. Children grow up as consumers/customers 
and most remain that way as adults. The mass media and consumer industries cater 
almost completely to satisfying desires, and almost not at all to growth. 


The universities are now businesses, with students as customers, and students choose 
what to be certified in. The pace is set for those things that can scratch an itch quickly 
without any new learning. 


Socrates didn't charge for his teachings because he would then have customers rather 
than students ... 


All this hasn't put *real education* completely out of business, but it happens with a 
very small percentage of the population (generally too small a market for mass 
providers to be interested). 


Another movie that has some of the same approach and pacing, etc., and is also very 
much made more for “learners” than “customers”, is Ariane Mnouchkine’s “Moliere”. 
What she is trying to help us with can’t be done like a “customer movie". 


One way to help think about this is to consider the difference between normal prose 
and poetry (why is poetry the way it is? why put readers through an experience that is 
not like the ordinary oral discourse they are used to? Etc.) 


Or to think about what does it mean to try to do *what real theatre does* in a movie? 
(Especially what Greek Theatre does?) Some of the elemental contact between actors 
and the human condition and the audience is lost — how to get back the contact? 


Mnouchkine was one of the most brilliant theatrical directors in France (and her father 
was a famous French film producer and actor). 


Her solution was to get back the elemental contact and *transport* by using 
immersive visual language and design — to the point that you can almost smell where 
you are — as the poetry and to almost eliminate spoken dialog — this is done 
astoundingly well and for almost 5 hours (many who have seen this movie — 
including myself — have reported feeling quite sad when they realized that it was 
coming to an end). 


2001 took much of the same approach 10 years earlier, with, in many ways, a more 
difficult set of ideas to try to get below/beyond regular thinking. 


Quite a bit of great art is dealing with “the world is not as it seems”, and the artists try 
to find ways to get beyond the quick rejections of ideas that violate what people 
believe and that which remains invisible to them. 


Alan Kay 
Have designed a few programming languages - Updated 3y 


a 


What aspects of Lisp influenced Smalltalk? 

I've written about this variously elsewhere: including in “The Early History of Smalltalk", 
and I've emphasized especially that “the idea of Lisp” pervades the underpinnings of 
Smalltalk as it was developed at Xerox Parc. 


For me, it was spending a Sunday afternoon in the late 60s tracing through John 
McCarthy's eval-apply for Lisp that rotated my perspective to thinking about 
computing “that way”. This included how late binding everywhere and finding a 
universal composable and embedable computing element could astoundingly reveal 
“hidden simplicities" in programming, its design, and implications for programming 
languages. 


This got me to go back and really look at Peter Deutsch’s PDP-1 interactive Lisp in the 
Bobrow & Berkeley book “The Programming Language Lisp” — the entire PDP-1 
assembly code was included in the paper, and Peter had realized that if you've got a 
really good live language, that you don’t need a separate OS. | was using a very similar 
approach for the Flex Machine personal computer, and the Flex language was derived 
from Euler, an extreme generalization of Algol by Wirth that came close to being a 
Lisp. 


(At Parc the several versions of Smalltalk that we did used internals that were variants 
of the internals of both Lisp and Euler.) 


The peculiar organization of Lisp — once you got used to it — made deep 
fundamental things easier to think about than Euler (and the general Algolish 
approaches of the day). 


One thing that stands out is that about half the tiny McCarthy interpreter is only there 
because he was thinking about functions that evaluate their arguments before the 
lambda form is “applied” to them. This requires forms of quoting to delay evaling for 
the “special forms” of QUOTE, COND, LAMBDA, etc. But you don't need that, and the 
MIT Lisp 1.5 people had already introduced the idea of an FEXPR, which doesn't eval 
its arguments at call time, but simply is handed the argument list. The FEXPRs can 
then eval parameters if needed and desired (this allows e.g. "COND" to just be a 
vanilla FEXPR). 


Also, Lisp raised the sibling question in a thinkable way about passing a lambda with 
free (global) variables. 


e What does this mean in the new environment? Look for them dynamically in 
the new environment? Can be useful. 


¢ Bind the globals to their values in the calling environment and package the 
lambda with its own a-list? Very useful (and very like a Simula-style object). 


¢ Search for the values of the globals dynamically in the calling environment? 
Also very useful, and rather like the *thunk* ideas in Algol for “calling by 
name". 


Lisp also had a second set of interesting ideas that were “loosely” (meaning “not 
really”) integrated into the rest of the language: the property lists on atoms. If you 
were thinking “Sketchpad” or “Simula” thoughts, and looked at Lisp, you would see 
some of what you wanted in closures and some of what you wanted in property lists. 


The term “object” in the 60s meant “a compound data structure with fields”, and this 
was another way to use property lists. Since you could stash procedures on the 
property list along with anything else you wanted, you could make a whole little 
world. If you had the procedures have a global variable that contained a pointer to the 
property list the procedure was part of, then quite a bit could be done. 


Interestingly, with this aspect of Lisp, the way it was done made it harder to “think 
good thoughts” than was helpful. 


The offspring of Peter's PDP-1 Lisp was BBN Lisp (1.85) and it supplied “all of the 
above”, and much more. 


Another interesting dynamic back then was Warren Teitelman’s thesis “Pilot”, his 
heroic pass at McCarthy's “Advice Taker” via meta programming rather than Al. Many 
of these ideas were incorporated in BBN Lisp when Teitelman came to BBN. (This and 
more became Interlisp.) 


And then there were more and more really interesting Al systems — many of them 
thesis projects — that were done in Lisp. 


| thought that the key principle here was that Lisp allowed you to learn more — in part 
by being able to try more and change your mind more — than any other 
programming system. This was especially the case when you needed something that 
the bottom of the system didn't have — the meta reflective properties of Lisp gave 
you more recourses to *still* do something that would help. 


In a world in which almost nothing was known about programming — even by the 
best programmers — this was really huge. 


| don’t think a lot more is known today about programming than then, though Parc 
did make a few important advances. But many of the advances happened precisely 
because Smalltalk manifested its ideas about “instantiating intercommunicating 
processes” by using Lisp techniques. If | had to do it again, | would retain much of this 
approach, but would do a very different design in place of "Lisp sharing”. 


€ Alan Kay 

: Have designed a few programming languages - 3y 

What does Alan Kay think of the proposed 50th anniversary celebration of 
Smalltalk in 2022? 

| think especially celebrating Dan Ingalls would be a key point in a 50th anniversary 


celebration (if it weren't for him, the various Smalltalks developed at Parc would not 
have happened). 


Secondarily, | would celebrate Chuck Thacker, as the main designer and builder of the 
Alto computer, on which the first Smalltalks were possible and made to run. 


There is a lot more than cliched “engineering” here with both these two guys — it is a 
bit difficult to adequately explain the combinations of art and technique and will — 
and real genius — that allows people like these to transcend what's around them. 


| take this perspective because | see computing as something with so many different 
kinds of degrees of freedom and self-referencing that defy mathematical type proofs 
for most promising ideas. And, if the topics are "personal computing” and “software 
science & engineering”, real artifacts are called for, not just to vet ideas but to also use 
them in ways deeper than philosophical mumblings. 


The actual realizations of “thought breakthroughs" in this end of computing has 
required groups of highly able talents of many kinds to be able to cooperate on 
projects larger than are possible by a single person. 


This means that the celebration should include both the precursor individuals and 
groups, but also the extremely important colleagues who operated in the magnetic 
fields of Ingalls and Thacker. 


This makes for a history that is more complicated than the story forms that we 
humans like to use (and restricting the history to story forms — as has been done by 
most writers — has had the unfortunately side effect of removing some of the 
“richness that should not be removed".) This has been the case for other large 
breakthroughs in computing (e.g. Engelbart's visions and their important partial 
realizations by hardware people led by Bill English and software people led by Jeff 
Rulifson). 


Finally, | would celebrate the ARPA community processes (and those of Parc which 
grew out of these), which made doing this research “happy and easy”, meaning: it was 
only about finding breakthroughs that enabled and progressed Licklider's vision: “The 
destiny of computing is to be interactive intellectual amplifiers for all humanity 
pervasively networked worldwide’. 


There's nothing happier than working on a truly great vision with great people, and 
nothing easier than to solve really difficult problems when all the time you have can 
be spent thinking about and approaching and redefining them in any way you choose. 


Alan Kay 
“1 Still trying to learn how to think better - 3y 


Why did almost no one in the US know about Charles Babbage in the 1930s, 
as mentioned in the book The Dream Machine? 


Alexis de Tocqueville wrote about Americans in the 1840s: “They have no past and no 
future, but live in an extended present”. 


| think it's quite fair to say about most Americans that History of almost any kind is 
very far from their main interests or past studies. 


€ Alan Kay 


Invented and designed a few user interfaces ... - 3y 


Are modes a feature or a bug in interface design? 


Quite a bit of early user interface design was done by the ARPA community starting in 
the early 60s — as an outgrowth of JCR Licklider’s vision: “It is the destiny of computers 
to become interactive intellectual amplifiers for all humanity pervasively networked 
worldwide” — after he was asked by ARPA to fund projects to further this vision. 


It was soon discovered that “modes” — being in a particular set of states and not 
others — were useful when you wanted to be in a particular state of affairs (like typing 
text in the middle of a paragraph, but painful when you wanted to do something else, 
and forgot you were in the mode and had to get out of it). 


Tom Ellis and others at RAND invented the first really good tablet, and decided to 
make an interactive system only using a single stylus. This got them to invent the first 
good gesture/character recognition interface, and then the more comprehensive 
GRAIL system (GRAphical Input Language system). Many of the interactions of this 
system were “modeless” (in the sense that the system would let you do whatever you 
were doing, but would then get things back to being able to do anything). Many 
insights about what was possible with a good pointing device and a display came out 
of this system. 


By contrast, the Engelbart system, which had many other important features, and 
included a pointing device, was hugely modey. This was one of several reasons it was 
difficult to learn. For us at Parc, it was a non-starter when we included children as 
prime users within personal computing. 


At Parc we decided that “modeless” didn't mean no modes, so much as it meant “you 
can always do the next thing you want to do and the system will exit you safely out of 
anything you might have been in that was helping actions you had been doing”. 


For example, in the overlapping window UI, what was in a window was a mode, but at 
anytime you could touch another window and it would come up to the surface to be 
the new focus of attention. (Everything was a “window", including buttons and icons, 
etc.) 


As Ul got to be more studied and designed, we realized that quite a bit of one 
important dimension had to do with “cognitive load” (managing the small number of 
“chunks” to which we can pay attention at any time — the classic "7+2" (which is really 
more like “4+3"). 


The small number of chunks means that we can forget both our current state and 
often future options when we are engaged in something, and it is one of the jobs of a 
good UI to help the user with this. 


So we need to find ways to allow “the next thing thought about” to be done. We need 
to organize things so there are options for "next thing” when the user gets stuck. Etc. 
For example, it really helps to have a visible menu of options just for this purpose 
(menus that can appear from a function key are often forgotten as an option when the 
user is under stress, etc.). Note how bad the Quora UI is for someone trying to 
write an answer here: they crazily have tied the meager editing menu to the top 
of this scrolling text rather than having it float to the side or having always 


appear at the top *outside of the text* — so | can’t see the options right now — 
this is very typical of the generally terrible UI schemes found in the web and web 
browser ...). 


One of the big helps was the realization that the “object-oriented languages” being 
experimented with at Parc that had the syntax of <object> <message> also furnished 
a way to think about modeless in a Ul: select an object or objects (which will 
illuminated menus of messages possible), then select the message you want to sent to 
the selection. This is quite modeless: it allows all kinds of further selections after the 
first selection, including other objects that are not menus of messages. We loved this 
identity between the visible and the symbolic parts of the Smalltalk system. 


A big insight was by Larry Tesler — and independently by others — that if pointing at 
text was interpreted as pointing between characters rather than at them, then any 
selection possibility would be just a “zero to n selection” (with zero putting the 
selection in between two characters). This meant that any typing action would be 
modeless — there was no longer an “insert mode” that eventually had to be gotten 
out of: all typing was “replace the selection”. 


Another idea that was a venerable way to institute modeless was a "meta key” that 
could produce a “fleeting mode”. For example, the shift key on a typewriter keyboard 
was a “fleeting mode”: just getting off it to do something else terminated the mode 
(as opposed to the “caps lock” key which toggled, and thus introduced a real mode). 
Many early keyboards in the 60s had lots of different fleeting mode meta keys, and 
most keyboards today retain a few (such as command, control, option, etc.). 


One “mode” that caused a lot of discussion in the early days was “painting/drawing”. 
We had done one of the earliest good drawing systems in the early days of Parc to 
make use of the early bit-map displays. There is no question that “drawing wants to be 
a mode", you are doing lots of strokes one after the other and you want to stay there 
for a while. The first such system we did had a visible menu of tools for brush size and 
gray tones, and these were modeless (the system could detect when you were out of 
the drawing area and in the menu area). You could use a fleeting mode key to give 
commands via gesture recognition, etc. The tools showed as the cursor to remind the 
user about the mode they were in, etc. 


One of the experiments was to allow painting/drawing to be “shaken off” by a violent 
gesture. Note that this was still "“modey” — you were explicitly getting out of the 
painting/drawing mode — but it was at least in the psychic space of tools on the 
cursor. 


A really big annoying mode was to have changes in media (text, drawings, etc.) be 
permanent. Warren Teitelman had been experimenting with comprehensive UNDOing 
in Ul in Lisp 1.85 (later Interlisp), and most Parc systems incorporated quite a bit of 
UNDOing into their Uls for both programming and media manipulation. Apple did a 
good job of following this up for the early Lisas/Macs. 


It's worth noting that quite a bit of modiness has crept back into Uls in computer 
systems today. | think this is from a combination of laziness and ignorance on the part 
of so many developers today. 


Virtually all *real modes* are a bug in UI design. 


é Alan Kay 
<1 Still trying to learn how to think better - 3y 


Are there problems where a solution by symbolic computation (using only 
computer algebra regardless of difficulty or time required to solve them) is 
absolutely impossible that can only be solved with numerical methods? 
What are some examples? 

Many non-linear problems don’t have algebraic solutions, and numerical methods 
have been used starting long before computing machinery to help. For example, most 
orbital mechanics, perturbation problems, fluid dynamics, etc. 


€A Alan Kay 
‘Have designed a few programming languages - 3y 


What are the most important elements of computing history not widely 
known by current CS graduates or working developers? 


I think this question is much too large for a Quora sized answer (in part because of a 
combination of a lack of curiosity in the field generally, and a general lack of emphasis 
on history in schooling). 


Just to pick four here in no special order — and from about the same period (~ 1962) 
— how about: 


1. John McCarthy's approach to computation and reasoning that allowed large 
complex interrelated “situations” to progress forward in states but retained the ability 
to use logic to reason about causality and relatedness ("fluents” etc.) Adopting this 
idea instead of the much weaker notions of “semaphores” would have made a big 
difference in every part of computing including much of programming, systems 


designs, and Al. 


Anyone in computing should have read and understood everything that McCarthy did 
and wrote (this doesn’t mean he was right about everything, but “interesting people 
are interesting whether they are right or wrong”, and John was one of the most 
interesting in his field and century). 


cf. McCarthy, J. (1963). Situations, actions and causal laws. Stanford Artificial 
Intelligence Project: Memo 2 . Also in “Semantic Information Processing” (Minsky 
ed.). 


2. Doug Engelbart’s actual goals and plans (as expressed in his original proposal to 
the Air Force in 1962). Understanding and acting on this would have allowed Moore's 
Law revolution into personal computing to have taken full advantage of the deepest 
ideas about it by Doug, and some of those after him who developed these ideas 
further. (The current state of affairs is such a botch as to not even address most of the 
important ideas at all, let alone doing them badly). 


cf. Douglas Engelbart (1962) Augmenting Human Intellect — A Conceptual Framework 
Ky 


3. Ivan Sutherland's “Sketchpad” system — done in 1962 — which was not just 
about the invention of interactive computer graphics, CAD, object-oriented design 
and relationships, but also showed how programming really needed to be developed 
forward in computing, for all programming, and especially for end-users. 


cf lvan Sutherland (1963) “Sketchpad: A Man-Machine Interactive System" 


4. Bob Barton's “A new approach to the functional design of a computer system” 
(~ 1961) which showed how hardware architecture should start to seriously cater to 
the advent and great need for much higher level programming languages and 
environments. This led to the Burroughs B5000/5500/6500 etc but had very little 
impact on what most people thought mainstream HW and SW was about (and until 
this day quite unfortunately). 


cf. Robert S. Barton: “A new approach to the functional design of a computer system", 
Proc. WJCC, 1961 @ 


Note: All four of the above early sixties ideas and insights by out and out geniuses had 
a great influence on subsequent work in the ARPA community, and especially at Xerox 
Parc. 


€ Alan Kay 
‘Still trying to learn how to think better - 3y 


Is our capacity for problem solving limited by abstraction? (Particularly in 
Computer Science) 


Marvin Minsky liked to say “You can't teach calculus to a cat”. So, sure, it is extremely 
likely that there are problems (and solutions) we can’t understand or even see. 


On the other hand, we already use what Jerome Bruner called "Goedelization” to 
greatly extend our biologically created brain (he meant that we can use our language 
facilities to build smarter “brainlets” and make them work for us inside our heads — 
e.g.: calculus is just such an example of us inventing and internalizing a kind of 
computer/mental-process to extend our abilities to think). The sibling of language — 
culture — can carry these inventions forward, and especially to young children in an 
even more powerful way (because they can more easily take on the epistemological 
perspective of the invention, not just the use of it). 


These language and cultural extensions were greatly amplified — some of them likely 
caused/catalyzed — by the invention of writing, and eventually printing. 


Many of the early pioneers of computing — McCarthy, Minsky, Licklider, Engelbart, 
Sutherland, Papert, etc. — saw that an interactive computer could make another 
qualitative boost — along the lines of language and writing and printing — both 
externally and internally. 


Most of the world — both in general and in computing — have yet to catch up with 
and understand these ideas. Most adults — even in computing — use computers very 
weakly across the board. 


The possibilities for children and the adults they will become have been particularly 
not understood ("quite botched” would be the technical term here). 


Still, | think a better — if more cryptic — question for this would be: "Is our capacity 
for X limited by our capacity?” 


| think the answer has to be yes, even with computers being able to take up more and 
more of the capacity load. At some point we won't be able to be helped by what they 
might be able to do. 


Alan Kay 
Have designed a few programming languages - 3y 


How do! make an Al program that learns? 


Here is a very simple but comprehensive example. You can see what is actually going 
on in the learning process for "conditioned learning”, and you can deeply learn how it 
works by making it. How would one explain Al to a complete beginner? 


€ Alan Kay 
‘Have designed a few programming languages - 3y 


Who developed idea of using vacuum tubes in computers? 


Take a look at the legendary Tommy Flowers, who was one of the earliest to make 
computing machines using vacuum tubes.Tommy Flowers - Wikipedia @ 


(Prepend). It's worth saying a few words about what came before. The “logic of true 
and false” was written about by Aristotle. George Boole showed there could be an 
algebraic form for logic in his 1854 (and second) treatise The Laws of Thought - 
Wikipedia. @ This was extremely influential. 


Charles Sanders Pierce in the 1870s showed that what we call the “NOR” (neither ... 

nor) function can make all the other logical relationships of two valued logic — and 
also remarked that you should be able to make a computer using electrical switches 
from these! 


And there was also “the most influential master's thesis in history” by Claude Shannon 
at MIT in 1937-8 ...) which 50 years after Pierce, rounded out these ideas in a grand 
way (including circuits for a 4 bit adder, etc.). This leap from ad hoc methods to a 
much simpler much more complete way of thinking about these complexities made an 
enormous practical difference to subsequent work (analogous to the invention of 
calculus in the physical sciences). 


The notion of using vacuum tubes for various computing functions is much older than 
the first whole computers to use them — for example: Eccles and Jordan made a “flip- 
flop" memory element from vacuum tubes ca. 1918. Flip-flop (electronics) - Wikipedia 
a. 


In another answer here, Petar Pandzharov mentioned John Vincent Atanasoff - 
Wikipedia @ in 1939 for an “almost a computer” (a pretty amazing set of inventions 
actually). These ideas greatly affected the later Eniac by Eckert and Mauchly. 


The main problem with vacuum tubes was that they burned out all too rapidly, and 
this would be a constant problem in even small computers which would require 
several thousand of them (some would always be burning out). And then there was 
size, heat, power, annoyingly high voltages for the plate circuits, and the need for 
quite a few additional components to “help” each vacuum tube with regard to 
voltages and currents, etc. 


Tommy Flowers in the 30s realized that the burnout of vacuum tubes (as with electric 
lights) came primarily from the thermal shock to the filaments, especially when turned 
on. He found that keeping some current moving through the filaments (never turning 
off the tubes) and “easing up the margins” (as they used to say) would keep them 
alive much longer. 


Flowers was also an amazing combination of a genius and "be able to do everything 
well quickly” (my favorite “total-hardware-software-process-guy” in the Tommy 
Flowers tradition was Chuck Thacker at Xerox Parc). 


These insights were key in the groundbreaking work Flowers did for the Bletchley Park 
codebreaking machines in WWII in the UK. When doubts arose about whether vacuum 
tubes could possibly be stable enough to be useful, Flowers argued not with words 
but just by quickly making and presenting the first of a series of vacuum tube driven 
computing engines. 


(And, yes, these machines did use the older Eccles & Jordan flip-flop circuits in 
Flowers’ designs for the Bletchley Park machines, including for Colossus in 1943 — 
Flowers and the Bletchley Park people were readers and did not waste time trying to 
reinvent things that had already been long invented ...) 


€ Alan Kay 
‘Have designed a few programming languages - 3y 


How do programming language designers manage language complexity? 
Originally Answered: How do programming language designers manage the language's 
complexity? 

A specific rather than general answer. For Smalltalk at Xerox Parc in the 70s, the 
approach was very Lisp-like: create a universal building block, implement that, and 
then deal with the complexities of designing and making “useful features” i.e. 
everything that one uses in Smalltalk would be considered “library” in more feature- 


laden languages. 


This did not banish complexity, but it did make it both more tractable and also more 
flexible as the learning curve progressed. It essentially spreads out the design of what 
programmers will use over much longer and deeper learning curves via building real 
systems in it. 


In order for this to work, the underlying implementation had to include quite a few 
things that (say) a language like C does not supply. This can add complexity (but really 
good work by experts here makes a huge difference in the overall simplicity as 
presented to programmers). 


And the collection of facilities presented to programmers (all of which are actually 
extensions to the simple general kernel) have to also be carefully designed and made 
by experts. The more powerful and easy the extension facilities the more careful and 
expert have to be the designers. 


This could wind up in a mess, as with more conventional approaches. The two main 
forces that turned these properties into big wins were (a) the dynamic late binding 
and definitional scheme made the creation of a new kind of Smalltalk “relatively easy" 
(we did 4 main ones in about 8 years), and (b) we had a truly superb system designer 
and builder in Dan Ingalls, who both was a co-inventor of the approach, and could 
really make it work. 


This allowed a lot of flexibility and many real things to be made, but then every several 
years, we could also take stock of what we thought we'd learned and compare this to 
the gracefulness of the language at that point. When there was enough of a 

mismatch, we would do whole new system (but build it using the old one as 
environment until it could be cut free and stand alone). 


€ Alan Kay 
~~ Have designed a few programming languages - Updated 2y 


How were computer programs and languages like COBOL installed into 

computers back in the 50s before software? 

There was always “software” for “programmable computers” because this was the 
reason such computers were made (my favorite “first” is Maurice Wilkes’ EDSAC in 
1948 (the Manchester people would quibble a bit). 


There would usually be a “bootstrap” sequence of (not a lot of) instructions that the 
CPU could be pointed at. Sometimes these would be keyed in from the console, 
sometimes they would be in the form of a ROM of some kind (diodes, switches, wires, 
etc.). These instructions could just read a peripheral of some kind (paper tape or card 
reader) to get a few more instructions into the machine, and it would then execute 
them to get more happening. 


These would then be enough to read the rest of the tape or deck of cards to get the 
program and then data for the job to be done. 


For example, here is just such a “dead start” panel on the Control Data 6600 ca 1964-5 
using manual switches to set the bits of the instructions: 


A close up of this panel: 


The 6600 had 60 bit words, with multiple instructions for each word. But it also had 10 
“peripheral processors” for I/O etc with 12 bit words and instructions. The “dead start” 
code above would be executed by one of the PPs, and this would then continue the 
bootstrap sequence indicated by these 12 instructions. 


The first COBOL was done by Grace Hopper, Harold ‘Bud’ Lawson, etc., for the Univac 
Il computer, which only had 2K words of RAM — 12 *decimal* digits a word for data, 
and 2 6-digit instructions per word for programming — and could have quite a few 
magnetic tape drives. It took about 60 “passes” using the tapes as 
secondary/swapping memory, etc., to compile a COBOL program. 


So: the COBOL (and FORTRAN, etc.) compilers never got actually loaded into memory, 
but just the parts of them needed for the parts of the compilation task being done. 
According to Bud Lawson, the first time an HLL program was compiled on two 
different machines from different manufacturers was in Dec 1960 for COBOL on both 
the Univac and the RCA computer. 


| i a» Alan Kay 
\ ‘Still trying to learn how to think better - Updated 3y 


Was computing dumbed down by the arrival of computer science in 
academia? 


” 


This is an interesting question! My first reaction was “absolutely not, just the opposite 
... but this is the old fogey from the 60s reaction. This is because — pound for pound 
— by far the best “real computer science” * in the 60s was being done in the major 
universities around the world (e.g. in the UK: Cambridge, Manchester, Imperial, 
Edinburgh, etc.; in the US: Penn, MIT, Princeton, CMU, Illinois, Stanford, Berkeley, 
UCLA, etc.; as well as major universities in Europe: ETH, Eindhoven, etc.). 


| then started pondering what has happened since computing in academia “went 

wide” — especially in the US — starting around a little before 1980, moving from a 
few top places — MIT, CMU, Stanford, etc. — to pretty quickly becoming a subject 
matter in the ~4000 or so degree granting four year college/universities in the US. 


My thought at the time was “How can they be doing this? There aren't anywhere near 
4000 top people in computing in the world — actually many many fewer. Where are 
they going to get professors of high enough caliber to offer a real college level 
curriculum in ‘real computer science’?” 


Part of the impetus for this rush was another transition that had started in the 60s: the 
catalysis of the baby boomers and other factors starting to move universities to 
entities much more like businesses. A good book about this is “Imposters In The 
Temple”. 


This was in full swing and was part of the boom in "CS" departments, regardless of 
quality. 


Another large factor in “business orientation” was a stronger transition in universities 
toward vocational training rather than “in depth understanding of complex issues” 
that was the hallmark of education. And part of this had to do with the differential 
lopsided inflation in parts of human lives (e.g. especially real estate), that got more 
and more students to aim more for jobs than enlightenment. 


| remember being shocked when Stanford CS — over the protests of many of its 
professors — decided (I think because of “encouragement” from the provost) to have 
Java be the initial programming language for incoming students. This is an enormous 
university with an enormous endowment, but now catering to perceived pulls from 
business rather than having its own special take on actual “sciences of computing” 
and actual “engineering disciplines for software”. 


| don’t know enough about the whole sector of university computing in the US today 
to answer the question directly. 


My own samplings over the years — via talks and visits to many university 
departments — have given me the prejudice that the students most definitely know 
almost nothing about the history of computing, and especially computer science. 


Right away, this is a clue that “computing is not being taught as a real field” (for 
example, in Physics you not only learn F = ma but who came up with it). 


In this limited anecdotal survey, | found no student or professor had taken the trouble 
to type “Engelbart” into Google to find out what Engelbart had really done, what he 
really thought, etc. 


This indicated a real lack of both perspective and intellectual curiosity. 


It also really pissed me off (a bunch of us had put quite a bit of effort in the 60s and 
70s to invent personal computing and world-wide networks primarily to make it easier 
to become knowledgeable and then enlightened ... ) 


| found no student who could define “computer science” as anything but something 
like engineering. The term “science” was kind of like the use in “Library Science” — 
there was no connection to what “Science” has meant since the 17th century. 


This deadly embrace is found in NSF (I spent quite a few years on advisory boards 
there). It deeply and negatively affects high school curricula. And now has invaded the 
elementary school years with really grotesque and substandard ideas about 
computing. 


My answer to the question: It is quite clear that “computer science” has been 
disastrously dumbed down. | think academia can be blamed for a lot of the problem 
(because part of its job is "keeping the flame” preserved against the general dumbing 
down out in the world). | think business can be blamed for “being dumb”: to the 
Dunning-Kruger level of not thinking they are, and then inflicting this on the general 
population in various ways, including schooling at all levels. 


From this old fogey's perspective, it is really a shame. 


* Prompted by Will Razen’s comment below, here is how we thought about 
“Computer Science” in the 60s, when the term was put forth as an aspiration and a 
question, rather than a done deal. 


Science is the attempt to detect and gather phenomena to explain it via creating 
models (theories) of some kind that produce similar phenomena, and to do this in 
ways that try to get around what is weak about our senses and abilities to think. 


So, if a bridge is built, it exudes phenomena, and can be studied, modeled and better 
understood. I.e. there can be a “science of bridges” (and of “structures generally’). In 

the large, this creates “Sciences of the Artificial” ** i.e. sciences that spring up around 
the artifacts that animals, mostly we, make. 


A wonderful thing about “The science of bridges” is that an improved understanding 
and better models for “bridgeness” can be in turn used to design and make better 
bridges, which have properties of their own to be studied ... 


The “sciences of artifacts” are a most delicious art form and pursuit for those who love 
and are called to the upward lifting adventure of understanding leading to making 
leading to understanding leading to ... 


Most of the sciences — whether of nature or artifacts — will use mathematics of some 
kind — often newly invented — to help with the modeling process. As in Physics, this 
shouldn't be confused with the science part of things. 


When Alan Perlis was asked what “A Science of Computing” could mean, he said it was 
“The science of processes; all processes”. He could as well have said “The science of 
systems; all systems” (he meant this to be the same answer). 


This recognized that algorithms, etc., are a tiny part of what computing is about. 
Computing is really about understanding, inventing, and building systems. As in many 


cases in the past in science, when existing math is not up to the task, new math has to 
be invented. In this case, one of the needs for new ways to think about what's going 
on stems from the degrees of freedom available and addition of the dimension of 
time. 


The degrees of freedom and extent of dynamic relationships in desired artifacts, 
generally mean that they have to be debugged rather than proved. (And there are 
parts of math where proofs have some of the same quality — all proofs have to be 
debugged, some proofs actually require modeling them on a computer to debug 
them.) 


Some of the earliest pioneers realized that the computer was "meta” in that it could be 
a great vehicle for modeling ideas about itself, so that much of the new math that was 
needed could be “extracted” from the “process space” itself. Many computer 
“theories” are models of processes written as running systems that can be debugged 
and explored.*** 


** see the book “Sciences of the Artificial” by Herb Simon (both a Turing Award and 
Nobel Prize winner, and one of the founders — with Turing awardee Alan Perlis of the 
CMU Computer Science Department). 


*** We are sometimes asked how Xerox Parc could have been so inventive and 
productive in the 70s with just a few dozen computerists. One of the answers is lurking 
in the above. We thought in terms of systems of processes, made models of these, 
and ran these models on computer architectures we invented and constructed. | 
would call what we did the virtuous uplifting spiral of "Computer Science” 
understanding things to allow various “Computer Engineerings” to make new things, 
and etc. 


€ Alan Kay 
“J Still trying to learn how to think better - 3y 


Does the Chinese Room prove that true artificial intelligence will never be 
developed? 


A less misleading situation is to ask whether there is something not “mechanical” — or 
something “intrinsically intelligent” — about atoms that accumulates into 
“intelligence” and “consciousness” when the atoms are formed into a human brain. 


A little more than 100 years ago most people, including many scientists, thought 
about “life”: that it just had to be about more than organization of simple building 
blocks. This was called “vitalism”. 


The biologists who did think that life was “only special organizations” — and did the 
work that started to show this — have been quite vindicated by the last 100 years. 


The debates about being able to organize simple components into highly intelligent 
processes have followed much of the same lines. There’s no reason to go to the 
trouble of the “Chinese Room” as added confusion. Some people want to think 
intelligence is something so special about human beings that it transcends science. 


To answer the question here: the “Chinese Room” proves nothing at all one way or the 
other. 


The history of learning how life works is “very suggestive” that intelligence is the same 
kind of idea: special organizations of parts that do not at all have to be intelligent into 
systems that manifest intelligence. 


And, perhaps more interesting — especially philosophically — “life” is suggestive that 
“consciousness” is also a manifestation of processes organized to be able to look at 
other processes. 


A philosophical argument is “do we get intelligence and consciousness if we can 
simulate every atom in a brain?”. | think the answer is “Yes”, but that it is a lot more 
fun and interesting to consider inventing solutions that are closer to realistic 
engineering. 


Lord Kelvin was famous for both his contributions to science (e.g. thermodynamics) 
and his unfortunate behavior of projecting his beliefs and desires into areas that are 
best considered by science and engineering (e.g. in 1895, he declared that “heavier 
than air flight is impossible”, etc.). Since birds are heavier than air, and do fly, he was 
presumably (we hope!) gloomy about the scaling of flight. 


So: what we should be interested in here is the science and engineering of 
“intelligence” and “consciousness”. Like finding out how life works, it is quite likely 
both possible and feasible, and needs to be done to settle the issue. 


From the practical standpoint, it is hard to imagine that solutions will not be more 
intelligent and reflective than human beings right from the get-go (we are actually 
terrible thinkers, given what thinking is all about). This would open up more questions, 
but | think pondering about this will require a lot more depth than Searle has shown. 


€ Alan Kay 

1 Still trying to learn how to think better - 3y 

What is an interesting and usual historical computer architecture? Was it 
definitely a dead-end or could it possibly make a come-back? 

A really interesting “early architecture of great interest" was Wes Clark's TX-2, the test 
computer at Lincoln Labs for the SAGE air defense system. This was also the machine 
that Ivan Sutherland used to invent a superset of modern computer graphics in 1962. 
One of several key features was its interleaved bottom level zero-overhead tasking 
architecture with about two dozen separate program counters, all tied to different 
conditions. This not only avoided normal interrupt processing, but also allowed many 
processes that were like HW peripherals to be emulated by programs (in fact, the 
graphics display and pointing device and knobs etc system that Sutherland used was 
creating by programming, and using the multiple PCs as though they were dedicated 
machines). 


The classic answer here would be the Burroughs B5000 architectures — Bob Barton 
was the main inventor of most of the ideas on this machine — which used tag 
protected memory, polymorphic operations, “byte-codes” that were the semantics of 
higher level languages, safe multiprocessing (and multiple CPUs), etc. 


A separate dimension would be architectures to allow emulation of high level 
operations as though they had been efficiently included in the hardware. In the 
50s-60s-and-70s this was done with programming dynamically reloadable microcode. 


At Xerox Parc we combined the three idea above in Chuck Thacker's Alto computer, 
which had 16 program counters, zero overhead task switching at the bottom, 
reloadable fast microcode, which was used to both make display processes for the bit- 
map display, the GUI, etc. and also to make B5000 like emulated higher level machines 
for languages such as Smalltalk, Mesa, Lisp, etc. 


Today such an architecture might be done with hybrid FPGAs (but done better than 
the current Intel offerings). 


The basic key to combining flexibility with speed generally has much more to do with 
how the memory and processing units relate, especially at the lowest levels. 


€ Alan Kay 

<1 Have designed a few programming languages : 3y 

Is computer program portability more important in today's mobile 
technology? 


I've always thought it was tremendously important in every era that I've been in 
computing. “Software wants to run on everything” — a huge percentage of computer 
people seem to bend over backwards to prevent this, but | think it’s a mistake. 


€ Alan Kay 


Have designed a few programming languages - 3y 
Is it possible to program a word-processor or text-editor? How would | go 
about doing this? 


Do you mean to make one from scratch? (They all have been made from scratch 
originally — more can be made). 


Do you mean to have a programmable word processor? (E.g. Emacs is a famous one, 
Microsoft Word used to be programmable — | don’t know if it still is — it was quite 
programmable in its original version as “Bravo” at Xerox Parc. 


Etc. 


€ Alan Kay 


Have designed a few programming languages - 3y 
What thought process would lead one to invent object-oriented 
programming? 
(Per Paul Hudson: “Maybe Alan Kay would answer. Since he did!") 


Well, OK. 


In 1992, | was asked to write a history of what | actually did by the ACM for their 
second "History Of Programming Languages” conference. There are several versions 
online, including at Bret Victor's site: "The Early History Of Smalltalk @". This should 
satisfy those who want more detail. 


For the purposes here for Quora, (a) | did coin the phrase “object oriented 
programming” for what | was working on in 1966, but (b) | thought of what | was 
doing was a kind of “rotation” and generalization of an idea | had seen in a variety of 
forms since my days as: 


e a biologist 


e then as a programmer in the Air Force starting in 1961: a way to make a 


“semantic file system" 
e then in the hardware of the B5000 computer 


e in the large, as processes in “multi-programmed" and “time-shared" 
operating systems 


¢ most startlingly in lvan Sutherland's Sketchpad 
¢ ina more prosaic practical form in Simula 


¢ and as done with entire computers on networks (as in the upcoming 
ARPAnet by my research community). 


What hit me in Nov 1966 was ridiculously, absurdly simple: “that, if you can have 
enough computers which can intercommunicate, this is all you need to define anything 
that can be done on a computer by any other means”. This is because each computer is 
universal, etc. 


Putting aside the practical difficulties — | was basically a lazy mathematician, so it was 
fairly easy to detach from the industrious pragmatist | had to be as a programmer — 
this was very attractive as an idea, because it scaled much better than the traditional 
ideas of procedures and data structures as building blocks. 


And it was so much cleaner, and so much more amenable to whole systems designs. 


And so amenable to thinking in terms of “designing and programming simulations". 
This last was from the two systems | saw within a week of each other that pushed me 
over the edge: Sketchpad and the first Simula — both of which were essentially set up 
to simulate complex entities containing many interrelated processes. 


Another very attractive feature of having everything being made from “semantic 
software computers intercommunicating via messaging” is that some ugly properties 
of “data” could not only be fixed, but even eliminated. For example, standard 
hardware and programming languages which map closely to them can give rise to 
simple arrays, but not to “data structures” that are just a little more complicated — 
like sparse matrices — and stacks and queues, etc. These are easy to do if these are 
instead “processes” and get to decide internally what the “assignment arrow” should 
mean. 


And this can lead to something much nicer than “data” that still has state but no race 
conditions on it. For example, inside the object it can decide to version the histories if 
it wants, it can do “atomic transactions", it can only allow viewing when all of its 
relationships are consistent. And many more. The data idea was always a bad one, and 
this new semantic building block would allow objects to progress through time — and 
“learn” etc. — but would be much safer. 


And then there is the idea of simulation itself, where the whole idea of “data” and 
“state” starts to get eclipsed by “competent objects” that can cooperate, much more 
like biological cells and human societies can. 


So, the thought processes that led to this were basically “systems thought processes” 
that were about being able to easily define systems of processes: any kinds of systems 
of any kinds of processes. 


This will not resemble much of what most readers will associate with OOP — in the 
sense of language like C++, Java, etc. 


Quite a bit of the disconnect is in point of view/perspective. C++ and Java etc. use 
objects mainly to define new things that are very like data structures, and the 
programming that is done is generally very data structure like (e.g. “setters” turn any 
kind of entity back into a data structure that can be imperatively munged by anyone). 
Technically, this is actually “Abstract Data Structures” and though a subset of what can 
be done with objects, is a divergence from the intent. 


The picture | delineated above allows a good systems designer to design a good 
system and make it, even if very large. It's obviously that much of the benefit of the 
idea will be lost if skilled systems design is not employed. 


An important side point is that the strength and generality of the definitional aspects 
of this paradigm can be quite independent of programming style (this is the central 
key of what “simulation of systems” means). 


An object can perfectly simulate a function, procedure, a data structure, etc., so 
everything old can be covered completely, but it can also simulate very smart agencies 
and agents, to provide much stronger ways of doing things. 


The flaw in how things have played out is that very few in computing actually put in 
the effort to grok the implications of “universal scalable systems of processes”, and 
instead have clung to very old and poorly scalable ways to program. 


At Parc, we got a tremendous boost in what we could conceive and write in very few 

lines of code because we tried to use as much as possible about what we understood 
about this new way to organize and program complex systems. (Today, there is more 
understood about this, but even the first time around, we were constantly amazed at 


what was possible.) 


é Alan Kay 

‘<7 Still trying to learn how to think better - Updated 3y 

How do you see Linus’ prediction failure of parallel computing? 

I'm not sure that he is predicting the failure of parallel computing. The context of the 


remarks seems tiny, and seems even limited to specific tiny contexts (for example 
using “locks” at all is a very bad way to try to deal with race conditions, etc.). 


| just did a Google search that got back to me in about 1/4 second after considering 
millions of items of indexed information spread around the world (this seems like 
parallelism working to me!) 


The Internet seems quite parallel ... 


But the simplest way to ponder this both philosophically and pragmatically is to note 
that biological neurons have a cycle time of about 5 milliseconds, yet we can do quite 
a bit of thinking and decision making in from about 1/3rd to 1/2 second. So: “a lot" 
can be computed by our brains in from 30 to 100 clicks. 


Real-time brain scans of metabolism while thinking reveal that hefty percentages of 
the 86 billion neurons in our brains are doing something related to the thinking task. 


Also, if you are familiar with molecular biology — and computer people should take 
the trouble to learn how all this works, each one of the 10 trillion or so cells in our 
body has billions of parallel pattern matches and actions. Some of these happen as 
rapidly as 1 microsecond. These converge to produce all the life cycle functions of 
each cell, including making more cells. The cells themselves started as a single 
fertilized ovum, and 45 cell divisions or so later, a baby was produced (it’s worth 
pondering the difference between this and the log to the base 2 of 10 trillion ... what 
do you think those extra cell divisions were used for). 


This sounds to me like “parallel computing” does work! and on scales that most 
computer people don’t do much thinking about. 


Perhaps the poor job done in traditional HW and SW of architecture in general and 
parallism in particular is likely more reflective of the abilities and predilections of most 
of the computerists working in those areas. 


One way to try and learn some really interesting things is to get a FPGA plug-in box 
for your computer and start to make highly parallel architectures with it (it's a lot 
easier all around with (a) thinking parallel from the get go, and (b) realizing that many 
difficulties with parallel computing are actually due to the traditional von Neumann 
architectures which separate memories from processing. You can easily comingle 
these in a FPGA. 


€ Alan Kay 


Have designed a few programming languages - 3y 


Which are the earliest real-time text editors? 


It would be great to track down a precision answer to this. 


There were very early text editors in the 50s using various flexowriters, teletypes, and 
modified IBM electric typewriters. 


On Whirldwind I, the Laning and Zwerler “George” interactive programming system ca 
1954 had limited text editing in its interactions. 


| would look to the pathway between Whirlwind | at MIT and the SAGE project at 
Lincoln Labs, etc. 


Also, the Bendix G-15 in the mid-50s (by Harry Huskey) had a TTY interface, and 
perhaps a text editing program. 


There was “expensive typewriter” at MIT in 1961 on their first PDP-1. CTSS at MIT used 
“typewriter” terminals and had text editing. 


| can't put my finger on the earliest display text editor. Wes Clark's LINC in 1962 had 
one (the LINC is a good candidate for the first real personal computer). Hard to 
believe somebody didn't do one for the PDP-1 ca 1960. 


Let's see if we can track this down better ... 


Alan Kay 
Still trying to learn how to think better - 3y 


What bottlenecks need to be addressed in order for the real computer 
revolution to happen? 

| don't think there is “one” real computer revolution, but the general premise that “it” 
or “these” haven't happened yet does help pondering (and | also think that where 
things generally are today is so low compared to what could be done makes the 
phrase “the computer revolution hasn't happened yet” quite reasonable). 


Most of what | meant when | coined the phrase in the late 70s or 80s was to counter 
the really low hurdles that the 8-bit micro hobbyists had set for themselves. This was 
in danger of spreading with the 8-bit micros (yep, it did), and | started going around 
with some movies from Parc, and a talk: “Don't settle for anything less!”. (This wound 
up having very little effect, even at Apple.) 


As usual the main bottlenecks are in various “normal forms” of human thinking about 
most things — which is to accept what is in the world as the standard, and if any 
improvements seem called for, to make them incremental with the least effort 
possible. 


The real deal is that the invention of computers is in the same class as the invention of 
writing, of mathematics, of science, even of the printing press (it's a qualitatively new 
amalgam of all of these with additional magic they don't have). These are not 
increments, but qualitative leaps into very different contexts that have created very 
different futures. 


A normal pervasive human trait is “instrumental reasoning”: basically to hold onto 
existing goals and to try to use new ideas and new tools to further those goals. Only a 
small percentage will be able to see very different possibilities and goals that are 
implied by new ideas and new tools. 


Just to check: I’m now going to the Wikipedia article about Seymour Papert’s Logo, 
the first programming language for children (and an excellent design). Logo 
(programming language) - Wikipedia @ 


And, as always, there are some words, a few links, but astoundingly in this article — 
really astoundingly at any time from the birth of the web and web browser in the early 
90s to especially now — which is to be read on a computer that is running, there is no 
way to try out any of Logo's features. For some reason, the Wikipedia people thought 
that something like an encyclopedia on an active computer just be just like a paper 
version (but with a few hyperlinks). 


This is not just lack of imagination, but a crucial lack of any understanding of what a 
computer is, and what personal computing has to include. 


And note that the web and the web browser were done years *after* Hypercard, which 
first showed a comprehensive way how to do all of this for end-users. 


How did it get lost? (It didn't, the web people didn't bother looking around at all.) 


And note that the web and the web browser were done 25 years after Engelbart’s big 
demo in 1968. (Again, what happened to make the field brain dead over time?) 


This is just the tiniest example of the problem. 


Another — staying with Wikipedia — is that it doesn’t have a half-way reasonable 
authoring system even for its limited goals. 


And the authoring that does exist via the web browser is quite terrible for end-users, 
even after 25 years, and even after better authoring systems had been done before 
the web in the 70s and 80s. (Why isn’t Google Docs — for example — above a 
reasonable threshold for authoring even the limited media it deals with?) 


All this is despite what's happened with Moore’s Law since the mid-60s, and that the 
not well designed language Javascript runs many orders of magnitude faster than the 
fastest hardware done at Xerox Parc (and the amount of memory and storage is 
astronomical in comparison). 


This means really wonderful authoring systems that can be used fluently by both end- 
users and by pros are quite possible. And “authoring” can include the safe wide- 
ranging authoring of processes not just simple media. 


Because the larger group of end-users didn't understand what the “computer 
revolution” should be about, it was the moral duty of people making computer 
artifacts to lift them. Instead, “normal” kept on being redefined lower and lower and 
was aimed at whatever people might buy rather to include important parts of what 
people actually need, as well as want. 


Some of these needs require much more learning on the part of the end-users than 
marketing people want to deal with (very like reading and writing). 


But we just can't retreat back to cave paintings just because reading and writing is 
difficult — we have to learn how to teach the new things that lift. 


Hypercard is a nice example to point to: according to Apple's followups, more than 4 
million separate people of all ages and kinds made their own scripted applications and 
applets. That was a revolution! But for a very short time (Apple itself failed to 
understand Hypercard, and dropped it). 


Where is “the equivalent of Hypercard in 2020"? 


Is anyone in the computing field even asking this question? Or trying to find it? 


€ Alan Kay 

-- 1 Have designed a few programming languages : 3y 

If all knowledge of programming and computer science was to be gone, 
what 10 books/papers could restart the field and bring things back or make 
things better? 

This question could be about a cold start to computing — no computers around at all 
— or it could be about “the computers and networks we now have are around, but no 
one knows how to program them”. Or something somewhere in between. 


This makes a difference because some of how programming got started — and some 
of what people still think programming is — came from the tiny size, slow speeds, and 
limited instruction sets in the late 40s onwards. Old ways of thinking tend to persist 
regardless of validity. 


| have a bit of an urge to try to answer this, but | need more clarity in the question. 


€ Alan Kay 


Have designed a few programming languages - 3y 


How many programmers were in the world in 1964? 


It would be great to find definitive answers. | just sent an email to Fred Brooks to get 
his opinion. 


However, it is safe to say — for 1964 — thousands at least, and most likely 10s of 
1000s. 


This is partly because of where IBM had gotten to by 1964: the somewhat backroom 
project of the 1401 to replace punched card accounting machines (PCAM), was very 
successful by then (I recall that they eventually made about 12,000 of these machines 
through the 60s). 


Wikipedia says 2000 1401s were installed by 1961 and these were 1/4 of all the 
computers in the world at that time (so they estimate 8000 computers by 1961). 


Just from the stated number and dates of the 1401 at this time (actually the year that | 
started programming on a 1401 for the US Air Force), | think we could say that it's 
likely that there were more than 10,000 programmers in 1961. 


This is because the 1401 was only inexpensive to lease relative to other computers — 
in today’s dollars a small system of mainframe, card reader, printer (without tape 
drives) would rent for about $22K per month. They would be run 24 hours a day within 
large PCAM shops and would likely have 5-25 programmers (there were about 20 or 
so in my Air Force shop for the 1401). | would guess close to 10,000 1401 
programmers alone in 1961. 


Wikipedia goes on to say that almost 10,000 were installed by 1964-5, and constituted 
1/2 of all computers in the world (so they think: perhaps 20,000 computers in the 
world in 1964-5). 


Just taking the 1401 numbers and dates at face value, indicates many more than 
10,000 programmers in the world by 1964. 


This early success of the 1401 plus other factors had gotten IBM to try “System/360" 
— launched in 1964 — to see if they could make a large hardware family in every 
price range, and try to unify it with one OS and software suite. This was a breathtaking 
move in any age. 


They bit off more than they could chew, but wound up making a lot of money anyway 
("No one ever got fired for buying IBM’). 


Fred Brooks wound up with the responsibility of the OS (OS/360) and wrote the classic 
“The Mythical Man-Month” about the experience and lessons learned. 


The combination of programmers needed for the quite a few thousand machines in 
the world around 1964, plus the hundreds to thousands who worked for IBM (I can't 
remember whether it was 500 or 2000 who wound up for OS/360 alone), certainly 
would add up to a lot more than 10,000. 


€ Alan Kay 
‘7 Have designed a few programming languages : Updated 3y 
How could computers develop their own language and take over the Earth? 


Frederic Brown was a terrific science fiction writer in the 40s and 50s. He also loved to 
write “half-pagers”: entire sci-fi stories in a few hundred words. Here's a favorite one 
from 1950: 


Dwan Ev ceremoniously soldered the final connection with gold. The eyes of a 
dozen television cameras watched him and the subether bore throughout the 
universe a dozen pictures of what he was doing. 


He straightened and nodded to Dwar Reyn, then moved to a position beside the 


switch that would complete the contact when he threw it. The switch that would 
connect, all at once, all of the monster computing machines of all the populated 
planets in the universe -- ninety-six billion planets -- into the supercircuit that 
would connect them all into one supercalculator, one cybernetics machine that 
would combine all the knowledge of all the galaxies. 


Dwar Reyn spoke briefly to the watching and listening trillions. Then after a 
moment's silence he said, "Now, Dwar Ev." 


Dwar Ev threw the switch. There was a mighty hum, the surge of power from ninety- 
six billion planets. Lights flashed and quieted along the miles-long panel. 


Dwar Ev stepped back and drew a deep breath. "The honor of asking the first 
question is yours, Dwar Reyn." 


“Thank you," said Dwar Reyn. “It shall be a question which no single cybernetics 
machine has been able to answer." 


He turned to face the machine. “Is there a God?" 

The mighty voice answered without hesitation, without the clicking of a single relay. 
"Yes, now there is a God." 

Sudden fear flashed on the face of Dwar Ev. He leaped to grab the switch. 


A bolt of lightning from the cloudless sky struck him down and fused the switch 
shut. 


Another way to look at this question is to note that computers don't have to take over 
the Earth, because humans are giving over the Earth to them. 


In other words, worry more about human stupidity than artificial intelligence. 
Still, it's worth looking at the question more literally. 


Staying on the stupid side of things for a minute, note that what ML actually does is to 
“classify/correlate/etc” many inputs into usually fewer outputs. If we tie actions to 
these we will get a goal seeking mechanism. (Take a look at my recent Quora answer 
about Explaining Al To A Complete Beginner. The example uses an extremely simple 
correlating mechanism to learn by repetition.) 


We could imagine using a larger version of this to try to buy or sell stocks, or even to 
start a pre-emptive nuclear war. The problem is that correlation/classification is quite a 
bit more noisy than most practitioners want to discuss. For example, here is one that 
has a .993 correlation over 10 years — many in the Pentagon and in the current 
government would very well see this as definitive and would want to push the button. 
But: 


Divorce rate in Maine 
correlates with 


Per capita consumption of margarine (US) 


Crewe rade on Mower 
Pe (ages COhgnee of Mangarwer (UN) 


‘+ 
+ 

4 

| 

_ 

! 

; 


ong eeeritions 2 47 46 44 43 41 42 42 42 45 


fw eagte ennenpten Hompwer’ £2 7 65 5.35.2 4 46 43 42 37 


Corretation: 0.992558 


Yikes! 


More of these “foolers” at 15 Insane Things That Correlate With Each Other @' (thanks 
to Mark Killman for the link). 


To finish, | will put off looking at computers, and instead direct your attention to 
something mechanical that has already taken over the Earth by developing languages, 
namely “life”. The three key processes are variation, speciation, and epigenesis. 


The first is due to the extreme noisiness of the chemistry of larger molecules at room 
temperatures and a bit higher. The second is due to the possibility of making a thing 
that is less noisy than the parts it is made from. The third is that the environment 

these processes match to (or don't) also contains the processes and what they make. 


So we've got both “unstable” and “somewhat stable” working in environments that 


can include the processes and their residues. 


It is well worth pondering how these three principles work at every scale from the 
atomic to the planetary. 


Evolving a brain that can create cultures and languages within the cultures is just the 
tiniest part of what has happened. 


é Alan Kay 
<< Have designed a few programming languages - 3y 


How do! reason that the Single Responsibility Principle (SRP) does not 
violate the spirit of the Open Closed Principle (OCP)? 


In theory, | don't like this question, but in practice, | do. 


But first a quote from Poincare: “Physicists talk about Physics, Sociologists talk about 
*methods*”. 


He is trying to point out the real and large differences between a deeply developed 
field, and one that is still trying to find out how to be scientific at all. 


When | first started thinking deeply about systems made from intercommunicating 
logical computers, | thought much more about the messages than the computers. 


The latter were there because they could do anything and everything that a computer 
could do, and so provided completeness all the way down in the semantics. 


But in this way of looking at things, a message send is something looking for a service 
for something, and there is an element of “search” and “retrieval” going on. 


So — as I've said many times previously — | made a real mistake to not think about 
what this style of programming should be called — the flippant response to the 
question put to me in 1966 was “object-oriented programming", and this has been a 
clear distraction to people actually pondering what is going on, and what *should* be 
going on. 


The peculiar way that “Smalltalk” happened — which didn't turn out as | had planned 
“Smalltalk-71" should — also made things confusing, because it was good enough to 
make a big difference at Parc, and we didn't go back to “the way it should have been”. 


One of the good things that we did carry through was to have messaging always be 
just requests: every “object” could reject any message. (This idea was also part and 
parcel of the way the ARPA/Parc community did networking, including the Ethernet 
and Internet.) 


And because of this, any message could wind up being a very generic and distributed 
broadcast request. 


From this POV, we can see that it is likely that many too many assumptions are being 
made in the “principles” above to allow them be good over time and in the face of 
scaling. 


But we can do something with “servers” and “messaging” that will scale well in a 
variety of ways. 


Let us suppose that we are trying to write some code in a process to accomplish 
something — and that what we have to work with are only servers and messaging. 


We should be able to see right away that what needs to happen outside of our code is 
that search processes need to find the appropriate servers for our needs. 


In the world of the large, there could be more than one service that could be used, 
and we need to have meta-info that will help the searchers to find the best ones (and 
they will change over time). 


We can see that the important dependencies here are semantic and beyond just 
matching the *kinds* of replies we are hoping for. 


But it’s also the case that a very large percentage of the gotchas down the road are 
from too many dependencies that are not understood (and some not even known). 


So it's worth quite a bit a work in any system intended to be real and over years, to do 
the extras needed to guarantee semantic integrity. This will make an enormous 
difference in the viability along many dimensions of the system. 


The most important principle in using “principles” is to check whether the “principles” 
are actually at a level that will help overall. 


Alan Kay 
I'm the "Alan Kay" in question (try Google for the usual misinformation) - 3y 


Why is VSCode nearly 300 megabytes while Sublime Text is just 16 
megabytes? 
“Ouch!” on both the 300MB and the “just 16” ... yikes! 


€ Alan Kay 


Still trying to learn how to think better - Updated 3y 


What are some good examples of liberal arts points of view brought to 
computing? 


“Liberal arts points of view"? 


Although | went to an “engineering” high school ("Brooklyn Technical HS"), | should 
have gone to “Bronx HS of Science” or “The High School of Music and Art”: the 
matchups are partly a matter of temperament. However, one of the most useful things 
that happened to me was what | encountered at BTHS. 


After the Air Force in the early 60s, | wound up at the University of Colorado in "The 
College of Arts and Sciences” with two majors (Pure Math and Biology), and two 
minors (English and Anthropology), but spent most of my time in the theatre, 
especially composing music for shows, playing in pit orchestras, and doing various 
kinds of theatrical design. | also took courses in the arts school. 


| had learned to program computers in the Air Force and used this as a half time job 
at the National Center for Atmospheric Research in Boulder to pay for my tuition, 
room, and board whilst there. 


Too much of all of the above led to overload by the time | graduated in 1966, and | 
couldn't bear the thought of either a real job, or going to grad school in math or bio. 


This led to the idea to take a year off and learn something about computing (like most 
programmers, | knew essentially nothing). This led to finding a master's degree 
program at the University of Utah (mainly because it was also in the mountains). 


This turned out to be lucky, because Utah had just started up an ARPA project — 
mainly to invent continuous tone 3D graphics (it did) — but also to generally work on 
the larger areas of interactive computing, especially with graphical displays. 


This was a revelation, especially encountering Ivan Sutherland's Sketchpad before 
even getting a desk, and a few days later: Simula I. 


But much larger was the "ARPA Dream": “The destiny of computing is to become 
interactive intellectual amplifiers for everyone, pervasively networked worldwide.” 


In the terms of the present question: the “ARPA Dream” was a “liberal arts” vision from 
many different perspectives. 


Another context in 1966 was “computer science” as an aspiration: The first Turing 
winner Al Perlis explained: “Computer science is the science of processes: all 
processes”. 


The term “Software Engineering” was not coined yet, but around was the realization 
that this new set of soft-technologies had both engineering components — beyond 
Electrical Engineering” — like designing and building bridges — and also science 
components — like the scientific study of bridges and other constructed things. (See 
Herb Simon's “The Sciences of the Artificial”, etc.) 


Human beings were the main reason for the ARPA Dream. What are they/we? What 
are we trying to do? How does our cooperative sociability play off against equally 
built-in impulses for competition? How do we communicate? How do we elevate our 
built-in kinds of thinking with invented kinds of thinking, and how do we use and 
invent representations to bootstrap processes and communities — e.g. science — that 
are smarter than we are? And how do these representations in turn affect the ways we 
communicate and the new ideas we might have? 


A key property of the ARPA community (and the subsequent addition of Parc, which 
grew out of it) was that it was more cooperative than competitive, and very much 
followed both the general way science operates, and most especially how scientists 
and engineers learned to deeply cooperate during WWII and the Cold War. 


Because of this — and especially because the tone set by the ARPA research founder 
Licklider (a research psychologist) — the community was very open to ideas from 
everywhere, not just the hard sciences and engineering. 


Most especially, it was devoted to “problem finding” not just trying to solve existing 
stated problems (it realized that when you want real progress, you are asking for 
departures from existing contexts, and this means that problems defined in the 
existing contexts may not be good, and could mislead). 


To pick just a few of the ARPA venues that were deeply into these larger issues: MIT Al, 
CMU (AI, CS), SRI (Engelbart), then Parc adding in the 70s, etc. 


| wound up never leaving this community. 


In order not to run on even longer, I'll just note that pretty much everything | had 
picked up in my larger interests, wound up getting used in the projects | worked on 
and initiated. 


Biology: Being able to see the relationships between real scaling and what needs to 
happen with large networks of processes intercommunicating. Distributed systems. 
Seeing what Sketchpad and Simula were instances of. Morphogenesis as the future of 
large systems. Etc. 


Math: Being able to see “algebras” that the existence of computational 
modules/processes could give rise to. Seeing that “intensional logic” was a more 
powerful way to think about larger computer structures than “extensional logic”. Being 
able to see what McCarthy was really doing with Lisp. The computer as a metamedium 
for “all meta”. Being able to think mathematically when inventing a programming 
language. Etc. 


English: What it means to read, write, create literature. How learning these affect deep 
thinking. What it means to understand from representations. Why we need to start 
with children and inventing computing and “literature” that is good for them. How the 
ARPA dream resembled invention of writing and then the invention of the printing 
press. "McLuhan". Etc. 


Theatre: Why theatre works with humans. What it tells us about communication 
between entities of all kinds. How this can be used to invent and design a universal 
GUI. Etc. 


Anthropology/Linguistics: Context and culture, and how this needs to affect learning 
and especially the education of children. Cognitive psychology as a subset of cultural 
anthropology. How humans actually learn and use languages. etc. Design to elevate 
whole cultures. Etc. 


History, and of Ideas, etc.: | don’t know where to stick this, but | think the largest 
influence on me, and hence on things | wound up doing, was a very wide range of 
reading. Besides the simple of idea of furnishing many more possibilities for seeing 
analogies, | think the big deal here is both the size of the “idea-world” that is revealed, 
and also the range of human thought: most specially including the mostly weak 
varieties of though over the ages, with occasional flashes of the most amazing ideas 
and accomplishments. “Problem finding” is a philosophical idea. 


Music and Art: how the “configurative” arts affect us, and what it means to 
communicate in figurative ways. Esthetics as a deep modality. Etc. 


Engineering: this became both a hobby — to be able to do engineering thinking after 
and sometimes during inventions — and most especially to work with the many 
genius engineers | was privileged to be around back then. Etc. 


So, from where | am in the peanut gallery, I'd say the “liberal arts” had a lot to do with 
many fundamental inventions that | was — with others — directly involved with, 
including: problem finding, personal/tablet computing, the bit-map screen, the Parc 
GUI, dynamic message passing objects, distributed systems of many kinds, etc. 


A provocative question for readers: can you see the lack of “liberal arts thinking” in 
where many of these early inventions have wound up today? 


€ Alan Kay 
‘I'm the "Alan Kay" in question (try Google for the usual misinformation) - 3y 


What is the relationship between performance and abstraction? 

This question needs more to it. For example, does the “abstraction” contain 
requirements for certain levels of performance? (as we'd expect if the abstraction is a 
“runnable requirements” language). If the result meets the requirements, then the 
relationship is 1:1. 


As another example, it was sometimes the case in the 70s that the excellent 
interpreted APL at the Stanford Linear Accelerator Center (SLAC) would outdo lower 
level FORTRAN programs. In part, this was because the higher dimensional internal 
operations of APL had been highly optimized, and also this APL "knew" more of what 
it was trying to accomplish (see Phil Abrams’ thesis). 


In any case, there is certainly no simple relationship between the two. 


Alan Kay 
7 Still trying to learn how to think better - Updated 3y 
How did drawing work on the Alto? 


Parc started up in 1970-1, coincidentally with the advent of the first VLSI chips: the 
1103 dynamic RAM from Intel (all of 1024 bits, but in a much: smaller package, lower 
voltage, and lower cost than core memory). 


For the first time, just a few people could make an entire mainframe “pretty easily", 


and we used the 1103 in 1971 to make an emulated PDP-10 (this was perhaps the first 
whole computer to use an entirely integrated circuit main memory). 


We did many experiments for displays, all of them driven by digital bit-maps of the 
characters to be displayed. This allowed some outlaw thoughts, such as: | wonder if we 
could afford to use the 1103s for a general bit-map display buffer memory for a 
personal computer? 


There were lots of reasons why we soon determined to afford this. One was that none 
of the previous computer displays were general with respect to the images that could 
be displayed: but a bit-map display could be general if there were enough pixels. Gary 
Starkweather was starting to print bit-maps on his new laser printer — these were 
general. Ben Laws and | did quite a few early bit-map character sets and started to 
understand what resolutions were needed, and Dick Shoup and | started looked at 
display of images using both half-toning and continuous toning (as the U of Utah was 
using for its 3D rendering). 


Making an early “printing quality” bit-map character in the font Lydian Cursive 
(1972). The character in the size it will appear on the display is shown at the top 
left. This also is a good capture of the phosphor color we used at Parc. 


A page of text rendered using the Lydian Cursive font on the experimental 
display system (1972) 


Cutting to the chase ... 


Because we liked — whenever possible — to do an extensive experiment before trying 
for a combination of invention and engineering, we used our experimental display 
generator in 1972 to try out a number of avenues that a bit-map display could 
support. These included character display in printing-like fonts (above), half-tone 
images and painting and generating them, and “2 1/2 D" Disney-like real-time 
animations in multiple levels of depth. We did the same for real-time multiple timbre 
generation for music playing. 


The first digital drawing and painting at Parc was on a system | designed — which was 
fabulously well built, and with a number of additional designs by Steve Purcell, a truly 
gifted Stanford student who was an intern in our research group. 


| did most of the early drawings and paintings. We first used an actual 
Engelbart/English mouse from SRI for this (which was a bit tough), but the Alto project 
also produced a much smaller and higher res digital mouse that was much better 
behaved, especially for drawing. 


Quora will not allow me to put up “actual bits” here, so | advise that you copy these 
images and look at them “actual bits” to see what is actually there — otherwise the 
single-bit images will produce moire patterns with the displays you are using. 


Drawing Pooh 


Sequence of painting the Pegasus 


Ze iv 


The menu of halftones and brushes can be seen at the top of the screen. 


These strengthened the case for going all out on the display for the personal 
computer for Parc that a number of us wanted to do (later named the “Alto”), and this 
project was started (under the radar) in Nov 1972. 


We had adopted Butler Lampson’s urgings to have every Parc invention engineered 
for 100 users, and this meant we had to be able to build at least 100 Altos. The 
original desired cost for the Alto was around $12.5K in 1972 dollars, but it wound up 
being more like $16K (or a bit more). This was before microchip CPUs, and they would 
have been too slow for many years in any case. So the CPU was made using MSI 
components (bit-slices etc) on a few boards. 


The memory budget was 128KB (in 16 bit words), and the tooth gritting part was to 
decide to allocate fully one half of this to “just the display buffer” of about 500,000 
pixels (808 x 606). In practice, the Alto could allocate regions of memory for display 
purposes, and had a “fat-bit” mode, so there were many possible tradeoffs to put up a 
“full-paged display”. Pragmatically, this allowed working memory in the Alto to range 
from about 64KB to around 80KB, and a bit more for very restricted purposes. 


Most important was that the Alto was microcoded with dynamically reloadable 
microcode, so almost every part of what is normally hardware was emulated. This 
allowed many experiments with the display system (entirely in microcode), including 
both Steve Purcell’s "CHAOS" 2.5D animation code, and for Dan Ingalls’ now 
everywhere in the world: BitBIt. 


The small high res optical shaft encoder early Alto mouse 


The Alto with the Cookie Monster, now animated with a Pegasus, and about 80 
“ping-pong” balls at about 10 fps. Animation system: “CHAOS” by Steve Purcell. 


And here is what | wish Quora — and all places for text and pictures on the net — 
would do: allow a drawing to be done in place in a galley of text and images (why not 
for goodness sake!). This is from around 1974-5 on the Alto in Smalltalk. The idea in 


the UI is that it will sprout in a frame around the object when you go into it (UI and 
this painting system by Ted Kaehler). The image/painting object had most of the 
features of MacPaint 10 years later. 


Note the multiple fonts in the text, the live font editor which is being used to 
render the text into the Elvish script Tolkein devised, and the drawing in place in 
the image of Frodo and Gandalf. 


Below is a little movie of the place within Parc that we built for learning with children 
ca 1975 or so. You will see that some of the Altos still just protect the users from the 
display using cardboard containers. 


The music — "The Happy Hacker” composed by Chris Jeffers of our group — is played 
in real-time on the Alto using Steve Saunder's FM synthesizer and Ted Kaehler’s Twang 
music system. The sound this produced was really good. It has been compromised a 
bit from several generations of recordings, 16mm film tracks, sprocket jitter, etc. But | 
think you can still hear the polytimbral voices pretty clearly on headphones. 


wa 
7 
j 
q 

x 


A few years later, the Notetaker Smalltalk kernel was done for both the Notetaker and 
the Dorado (called “Smalltalk-78"). Here is a screen in actual pixels (again: please copy 
and display elsewhere to get rid of the moire on the background) from a revival of this 
system (from a file on a Xerox disk pack) using the Dorado screen format. 


MLRIIA Learning Research Group 


\e 
et 


oe oe 


You can see this system being demonstrated, including the painting, in the tribute 
video | did for Ted Nelson a few years ago: 


This system is partly of interest because it was the one that was used in the infamous 
“Parc Steve Jobs" demo. 


€ Alan Kay 
Have designed a few programming languages - 3y 


To what extent did Simula influence Smalltalk? 


To a large extent. Basically: Simula cannot be praised too highly — and along many 
dimensions. 


| was asked to write a comprehensive history of Smalltalk for the ACM's “History of 
Programming Languages” — called “The Early History Of Smalltalk”. This can be found 
online. One version of it was posted by Bret Victor. @ 


The original history was published by SIGPLAN in 1993 @. 


Here you will find all the influences | was aware of that helped me in 1966 come up 
with the notion of “all semantics/computations/everything being in terms of messages 
between logically complete computers”. For this, Simula | was a catalyst after — 
especially — seeing lvan Sutherland's Sketchpad a few days earlier. 


You will also find how Lisp came to play an important role to replace the Simula-Algol 
scheme with a late bound reflective one, and how Dave Fisher's CMU thesis (1970) was 
key. 


Finally, the 4 different Smalltalk designs at Parc and how they happened are examined 
in some detail. 


€ Alan Kay 
Have designed a few programming languages - 3y 


What was the historical significance of FORTRAN? 

Originally Answered: What was the historical significance of Fortran? 

The historical significance of FORTRAN is primarily due to the level of software 
engineering in the system, but there was also a “four minute mile” effect from the 
combination of “early” and “efficient”. 


An earlier significant system along the lines of “runable mathematics” was by Laning 
and Zierler — an interactive system called “George” @ ca 1952— on the Whirlwind | 
computer at MIT. The syntactic and semantic facilities provided by George exceeded 
those of FORTRAN, and it also was one of the very first interactive programming 
systems. (Apparently FORTRAN was already started at IBM by 1952.) 


Especially in the 50s and into the 60s, with tiny and slow hardware, hand optimization 
of code in assembler was one of the main jobs for programmers. The IBM group 
headed by John Backus determined to make compiled code compete with optimized 


assembly code in both speed and space. This led to long, many pass compile times — 
thought of as a luxury back then (programmers were cheap), but the results were 
quite good enough in many cases. 


The extent to which IBM carried this out was astonishing. For example, you could 
actually write and run FORTRAN on the 8KB 1401 at USAF Air Training Command 
(where | worked in the early 60s). | think (| hope someone will track this down) that the 
FORTRAN compiler would do over 100 passes — from tapes to tapes — to doa 
compile. This was somewhat frowned on by the brass at the time. 


FORTRAN's (relatively) efficiently running code produced two distinct kinds of user 
groups: (a) scientific programmers and (b) job shop programmers who wanted to get 
certain programs running well enough and quickly. The FORmula TRANslation part 
was limited to just arithmetic expressions: everything else in the early FORTRANs was 
quite basic and at the level of control structures and simple macros in assembler. 


é Alan Kay 
1 Have designed a few programming languages : 3y 
What are your favorite analogies to explain programming? 


| have sometimes used “creating a banquet” or “organizing a restaurant”. 


The “following a recipe” analogy is included as a tiny part of the larger processes, but 
it doesn’t capture many of the important parts of programming, which involves 
design, organizing many processes — including intelligent entities — to cooperate at 
many different levels, the systems designs, the many heuristics that deal with 
“variance”, the learning involved, .... And so forth. 


If the above is to work as an analogy, | should not have to give more of an 
explanation. 


€ Alan Kay 
{Have designed a few programming languages - 3y 


Is it possible to learn two programming frameworks at the same time? 


Originally Answered: Is it possible to learn two program ming frameworks at the same time? 

In that galaxy long ago and far away (the 50s and early 60s) there were no standard 
operating systems, and most programming was done in assemblers that were 
symbolic representations of the underlying machine codes. And many “shops” had 
several computers of different types, and there was migration ... 


So most programmers back then had to learn multiple machines, assemblers, macro 
systems etc. This led to a kind of “abstract machine” that one wound up thinking in for 
programming. Some of the better programmers wrote portable macro systems that 
they would move from machine to machine, that would translate into whatever 
assembler existed (e.g. Bruce Baumgart had a really nice one). 


The machines were quite ideosyncratic back then, but the tendency was to 
homogenize them at the next level of abstraction. 


Soon after FORTRAN in the late 50s, there was an explosion of “high level” languages 
starting in the early 60s, and many of them were pretty specific about their application 
areas. 


Most programmers would learn several of these as well. This led to a very different 
realization: that it was really good to learn different kinds of programming languages 
because it expanded one’s thinking and approach. For example, a popular trio was 
Algol, APL, and LISP. 


However, there was a lot to be said for homogenization as well. This led to the scheme 
called UNCOL (pronounced “uncle”) by Ted Steele (UNiversal Computer Oriented 
Language). The idea was to have this more as a universal target for HLL compilers than 
to be directly programmed in. 


C's best role was to act as this target. But currently this role is filled by LLVM pretty 
well (especially as compared to the more modest goals of UNCOL). 


It should be pretty clear that learning only one programming scheme is terribly 
limiting, and this is already within a very (so far) limited range of options. 


€ Alan Kay 
‘Still trying to learn how to think better - 3y 


Why do musical artists - even the best ones - lose their ability to write a 
memorable new melody, far before they lose the ability to sing or play? 


For the same reason that it is much easier to read one’s own language than to write in 
it — especially to write beautifully. 


€ Alan Kay 
“1 Still trying to learn how to think better - 3y 


Could the piano have more than 88 keys? 


Take a look at the Boesendorfer Imperial Grand. It has 9 extra keys in the bass that 
extend the compass down to a low C (so 97 keys). 


Even more important is that Boesendorfers are wonderfully sounding instruments, 
with a very creamy french horns to flutes in the tenor through soprano range. 


€ Alan Kay 
1 Still trying to learn how to think better - 3y 


How many of ARPA/PARC's researchers were married while they did their 
breakthrough work? 

With regard to the computer researchers there, I'd say “quite a few” (| don't know how 
to be more precise). And | would guess that the physical sciences researchers were 
even more so. 


€ Alan Kay 

1 Still trying to learn how to think better - Updated 3y 

Can you explain Al to a novice? 

Many single-celled microorganisms are able to detect toxic conditions — temperature, 
acidity, etc. — and initiate a “tumbling” behavior that finds random directions that can 
move them away from the danger. Similarly, many can detect when “things are good” 
and even guess at “better” and move in that direction. 


A thermostat is a machine that can detect “good” and "bad" and initiate reverse 
actions to counter “bad”. 


W. Grey Walter created a “robot tortoise” in the late 40s that used this idea to avoid 
being trapped by obstacles, find its “nest” for recharging its batteries, etc. 


A time lapse photo of a Grey Walter turtle finding its nest. 


The next step up from mechanisms that can stabilize a goal, was also illustrated 
beautifully by Walter, in having one of his tortoises be able to do “conditioned 
learning" (like Pavlov's dog), to associate the sound of a whistle with a “bump” and to 
initiate the “bump avoidance” behavior when a whistle was blown. * 


Walter called this “The 7 Steps Between Chance And Meaning", and used this to make 
an amazing design and mechanism. 


eee es 


Heb —at be —— 


An “Etoy” to implement Grey Walter’s 7 Steps. 


Steps 1, 2, 3 notice a coincidence that could be significant, which is remembered 
briefly (4). If there are enough of these in a short enough period (5) this is 
remembered as significant in a longer-term memory. After this a stimulus (6) will 
be looked up in the long term memory (7), and if this has a memory, will initiate 
the behavior without needing the original stimulus. 


** | wrote an “active essay” for children using these ideas “Marvin Minsky And The 
Ultimate Tinkertoy” 


Marvin Minsky And The Ultimate TinkerToy @%*** 


~* ' 


— = 
meee te py eel 


—) 


What's fun is how the rest of what we call “intelligence” can be made from these two 
examples (notice e.g. that the conditioned “stimulus” has become a kind of “symbol”. 


“ML" comes directly from the second example — it is a kind of “curve fitting” or 
“function finding” or “correlation”. 


But the most important part of “real artificial intelligence" comes from being able to 
do things with “symbols”. Much of this has gotten lost in the last few years because it 
is much more difficult than the curve fitting parts. But it should be readily apparent 
that trying to make autonomous mechanisms that don’t know what they are doing — 
the tragic Boeing 737 Max should be a sobering reminder — is a very bad path to 
take. It's imperative that symbolic mechanisms that can deal with contexts — not just 
“fits” — be developed. 


Daniel Kahneman**** calls the curve-fitting fast reaction mechanism in human brains 
“System 1", and the much slower but much more intelligent cognitive mechanisms 
“System 2". Both of these are needed for any kind of reasonable “Artificial Intelligence” 
that could affect human lives. 


A systems organization for “artificial psychology” was suggested by Marvin Minsky in 
his “The Society of Mind” and “The Emotion Machine" books.***** One of the many 
interesting problems Marvin tackled was how different parts of a brain/mind could 
communicate with each other by only using mechanisms that were biologically 
plausible. 


“Real artificial intelligence” doesn’t need to do things biologically, but it is very likely 
that any real solutions will have been “biologically inspired". For example, consider the 
problems of relating the kinds of processes that ML can build with the kinds of 
relationships employed in human use of language. This is not a trivial problem. It's 
clear that humans don't do this very well either, and this is probably a main clue. 


* Grey Walter: A machine that learns, Scientific American (1951) 185(2):60—63 


*** Yoshiki Ohshima helped greatly in getting the “active” part embedded and 
working, using John Maloney’s “GP” interactive language for children. 


**** Daniel Kahneman “Thinking: Fast and Slow” 
***** Marvin Minsky: “The Society of Mind", and “The Emotion Machine” 


€ Alan Kay 
<7 Still trying to learn how to think better - 3y 


What did you think of the work being done at General Magic? 


| was “in the conversation” (along with John Sculley, etc.) when this project and then 
company was set up. Basically, the idea was to make a hand-held device that would 
embody something like the next levels of HyperCard as a cradle for many kinds of 
services gotten via networking. 


Marc Porat was the driver of this project, and wrote to John Sculley in 1990: “A tiny 
computer, a phone, a very personal object ... It must be beautiful. It must offer the 
kind of personal satisfaction that a fine piece of jewelry brings. It will have a perceived 
value even when it's not being used... Once you use it you won't be able to live 
without it." 


Sound like an iPhone? Actually the deep concept was even more so. Quite a few of the 
original Mac team — Bill Atkinson, Andy Herzfeld, Susan Kare, etc. — wound up 
working on this. 


Outside of Sculley “Apple Management” was not interested in this (in the same way 
they were not interested in HyperCard, and any number of other good ideas around 
this time). This general lack of interest eventually brought the entire company to its 
knees just a few years later. 


Many good things were done by the new company wrt both hardware and software 
(worth checking out via Google for those interested). | especially liked the UI, the 
extension from HyperCard, and how the end-user programming was thought of. 


| believe this product could have been a success with the “old Apple” behind it. It was 
really pushing the state of the art for its day and thus required even more of the top 
levels of people that it had plus a sustained marketing push that only a very large 
company could make happen. 


One could argue that it was too far ahead of its time in that it required too few actors 
to do so many of the myriad things that gradually happened as the Internet took hold. 
However, what we wound up with (via the WWW and its browser) was quite terrible by 
comparison, and this has been greatly limited by the limited concept of the web and 
its browser). 


€ Alan Kay 

<7 I'm the "Alan Kay" in question (try Google for the usual misinformation) - Updated 3y 
Why was Alan Kay's Smalltalk in the Xerox Alto sufficiently efficient for a 
desktop GUI in 1979 but it isn't sufficient today? 

The two answers by Eliot Miranda and Mark Miller are very good, so there is little to 
add here. But | should say a few words about the general Parc approach to HW 
architectures (and more can be found in other answers I've given). 


First, though, is to compare a one-bit-per-pixel architecture on a 16 bit word machine 
to a 24-bit-plus-alpha architecture on a 32 or 64 bit word machine. There's a factor of 
8 to 16 difference per memory cycle (3 to 4 doublings of Moore's Law) just for this. 


The problem with the 16-bit micros in the early 80s — such as the Motorola 68000 
(used on the Mac) and the Intel 8086 (used for IBM PC compatibles) was not just cycle 
time but also the fundamental difference between being able to stick “hardware like 
functions” (such as a byte code interpreter, graphics painting, music, etc.) into a 
microcode engine that could run about 5-6 times faster than main memory vs. having 
to do this with a CPU that is using main-memory cycles for fetching its own code and 
data while also trying to do low-level functions. 


The Alto was fabulously efficient because its approach to how its real CPU (the 
microcode engine) interacted with the available memory bandwidth (including the use 
of multiple program counters for zero overhead low-level task switching and avoiding 
interrupts). 


Whereas the commercial CPU architectures had very poor notions of how to use 
memory and how to emulate low-level functionality. This made them extremely 
inefficient for any given clock cycle they could muster. 


We put in quite a bit of energy during the last part of the 70s trying to explain 
software needs and how this should affect hardware design to both Intel and 
Motorola — and giving them the same kinds of demos that Steve and the Apple 
people saw — but to no avail. Neither company ever got interested and savvy in 
software. 


In the 80s, this was even more difficult because the IBM PC architecture — for an 
deeply naive market — was quite successful (this is like the WWW/browser, which is a 
terrible architecture, but the market can't see it). 


When at Apple, old Parc hands urged the development of custom CPU chips and 
other custom hardware, but this was rejected for almost two decades (Steve 
mentioned this in the introduction of the iPhone). 


A simple bottom line is to notice how difficult it was and how long it took for the 
industry to wake up to the need for GPUs, despite that screen painting was a major 
need for every personal computer and workstation. This kind of blindness is almost a 


defining characteristic of “computing” but I've never been able to understand quite 
why — especially given how successful Parc was because it was willing to do any and 
all hardware that advanced software required. 


CA Alan Kay 
Still trying to learn how to think better - 4y 


Why are we born into sin? 


Here’s an attempt at a non-religious answer. 


We humans have many paradoxes that we deal with quite unevenly. For example, 
genetically we have urges to cooperate — we would not be social animals if we didn't 
— and also to compete. We desire to be in a group, and — once there — we start 
competing — and often cheating — once we have a bit of cooperation assured. The 
conflict between these two genetic drives has caused a lot of trouble and misery. 


This and other such paradoxes can be recognized as such, and have led to various 
theories of “good” and "bad” behavior. Some of these are very compete-centric, such 
as social-darwinism, and some are very cooperative-centric. 


Another part of our genetics has to do with stories, how we use them as explanations, 
and how we invent what stories need to be even better stories. This and a few other 
tendencies we have readily leads to the notion of "sin" — a more emotionally charged 
version of “bad” — it doesn’t require a deity, but very often that is part of some of the 
larger stories cultures comfort themselves with. 


A related example is the idea of “soul”, which is often reified into a thing, and also very 
often connected with beliefs in deities. But notice how useful this is if it is kept as an 
idea and concept: it is the thing about yourself that you most don’t want to lose. 
Similarly, “sins” are those things that go against your best conceptions of yourself and 
the societies you live in. 


4 a» Alan Kay 
<< Still trying to learn how to think better - 4y 
Why do Fender guitars have different headstocks than other guitars? 


Just to introduce the concept that there were things happening before the near 
present ... i.e. in the past. 


One of the first guitars to use an inline headstock was the Fabricatore ca 1806. 


The idea was used in many guitars by the top German builder of early Romantic 
guitars: Stauffer. Here's an example from 1830. 


€ Alan Kay 
Have designed a few programming languages - 4y 


Which programming language is best for developing a website that can 
scale to over 100 million users? 

Originally Answered: Which software language is best for developing a website that can scale to 
over 100 million users? 

By the end of 2020 it is estimated that there will be about 50 billion computers 
connected via the Internet. How to invent such an architecture—and one that never 
has to be shut down for fixing and growing— was worked out by the ARPA-Parc 
research community over some years centering on about 45 years ago. 


This is not primarily a programming language problem, but as with most things that 
scale, it is a design problem. 


The question lacks too much information to go much further (for example, what kinds 
of services, what kinds of interactions between users, etc.). 


But it wouldn't hurt at all to start with the idea of a distributed system here, and to 
think about how to do sharing with processes that are distributed (and possibly 
redundantly sited as well). 


A really fun (and “funny") way to do it — and good for folks to ponder — is to note 
that everyone who will be using these services will be doing so through a very 
powerful and high capacity computer (laptop, phone, etc.). So there is already a 
distributed processing system of 100 million potential processors and storage. The 
Croquet systems we've done over the years show that a considerable amount of the 
load can be distributed and shared via the users themselves ... 


Once you work out a good such architecture, it is likely that really convenient and 
flexible dynamic languages will have plenty of computing power for most of the 
needed tasks. 


"As scaling increases, architecture dominates materials” (some old fart said that many 
decades ago, but guess what?) 


Etc. 


€ Alan Kay 
Still trying to learn how to think better - 4y 


What were the biggest mistakes made by Xerox? 


Originally Answered: What were the biggest mistakes made Xerox? 


It's worth noting that IBM had no idea about personal computing (they would never 
have farmed out the OS if they had taken it seriously), and didn’t know how to sustain 
their early success. 


Nor did DEC. 


Apple got the GUI, but not networking. And, Apple didn't understand what Microsoft 
did (Apple was for quite a while the second-largest software company in the US, but 
they were quite timid about this). 


Microsoft did not understand the Internet. 


The people who did the web and the web browser didn't understand “personal 
computing.” 


Etc. 
Xerox was earlier — and a copier company — and didn’t understand most things. 


All of the above is worth pondering because the ARPA-Parc research community did 
have a very good picture of the whole gig, and this was what allowed it — and 
especially at Parc — to invent most parts of the future as a well fleshed out ecology 
covering the hardware, software, GUI, OS, etc. 


In the early 60s, | made my living and worked my way through college as a 
programmer on mainframes in the Air Force and the National Center for Atmospheric 
Research. | had no knowledge of what ARPA was doing, and more importantly for this 
answer: no overlap of their sense of “desirable destiny” when they looked at 
computing. 


The only difference was that in 1966, when | accidentally wound up at an ARPA- 
supported grad school and saw the “ARPA Dream” and what they were trying to do, | 
went crazy with the sweep, scope, and destiny of it all, and joined the other “crazies” 
already in the ARPA community as one of a few hundred. 


What made it impossible to even think about anything else is that the "ARPA Dream” 
was sustained by deep engineering and science — there was no question that the 
“dream” was actually doable: we just had to figure out how. 


None of the other organizations above really could see the “ARPA Dream,” nor that it 
required the invention of a whole new system of hardware, software, networking on 
every scale, advances in human factors in many directions, etc. 


Each organization picked out something they could make money with and that 
appealed to their local goal structure (including Xerox, which only latched onto the 
laser printer). But none of them were willing to abandon their old contexts and move 
wholeheartedly into a quite new total context. 


This “goal conservatism” is also called “instrumental reasoning” (any new idea or tool 
is judged by how well it contributes to current goals). Most of humanity has this trait 
(and it's easy to see why). 


My view, looking back, is that there were a few people in the right place at the right 
time who could see new goals when they looked at computing, and the funding of 
Licklider by ARPA was a huge lucky step. This funding, and the activities it gave rise to, 
attracted people of like mind, and — as with me — those who could “see when 
shown.” 


So: | look at most corporate shortsightedness as “unfortunately normal” and at ARPA 
as “unusual” and “lucky.” 


€ Alan Kay 


Have designed a few programming languages - 4y 


What happened to Dan Ingall's Smalltalk interpreters? 


| think quite a few of them are around and working in different forms. Dan has given 
talks showing live examples from almost all of the Smalltalks for which he was the 
main kernel force. Many of these talks are on YouTube. 


Online, there are playable versions of Smalltalk-76, and a usable version of 
Smalltalk-78 that were brought back to life primarily by Dan and Bert Freudenberg. 


There are also playable versions of Smalltalk-72 in several forms, including on the 
excellent Alto emulators done by Josh Dersch at the Living Computer History Museum 
in Seattle. 


é Alan Kay 
A Still trying to learn how to think better - 4y 


What parts of Englebart's Augment/ARC work are still missing from 
modern groupware, and why are they important? 


A direct answer here would be much larger than "Quora sized”. This is because Doug 
Engelbart’s conceptions were qualitatively different than most of the much tinier ideas 
and visions of today’s developers. 


The wonderful Bret Victor wrote the best obituary for Doug shortly after he died: A 
few words on Doug Engelbart @ 


It's worth reading the entire piece. Two good extracts are: 


“When | read tech writers’ interviews with Engelbart, | imagine these writers interviewing 
George Orwell, asking in-depth probing questions about his typewriter” 


and 


‘Almost any time you interpret the past as "the present, but cruder", you end up missing 


the point. But in the case of Engelbart, you miss the point in spectacular fashion. 


Our hypertext is not the same as Engelbart's hypertext, because it does not serve the 
same purpose. Our video conferencing is not the same as Engelbart's video conferencing, 
because it does not serve the same purpose. They may look similar superficially, but they 
have different meanings.” 


The best way to get answers to the Quora question is to read Doug's original proposal 
from 1962, i.e. 57 years ago, submitted while Ivan Sutherland was inventing interactive 
computer graphics (and lvan is mentioned in the proposal). 


aroun It 


Sameaery Apport hnder 12 


AUOMENTING HUMAN INTELLECT: A CONCEPTUAL FRAMEWORE 


CONTRACT AF aman ian 


STANFORD RESBRARCH INSTETUTE 


CALIFOENTA 


http://www.dougengelbart.org/pubs/papers/scanned/Doug_Engelbart- 
AugmentingHumanintellect.pdf 7 


€ Alan Kay 
Still trying to learn how to think better - 4y 


What's arguably the single most amazing thing that computers have made 
possible? 

| think the answer to this is very like the answer to “what is the single most amazing 
thing that writing (and then the printing press) have made possible”. 


It is not that writing and the press made possible a very different kind of travel 
through time and space — this is wonderful and important — but that a new manner 
of traveling through ideas was a consequence of what it meant to learn to read and 
write fluently. Many studies have shown that a literate culture thinks in ways that are 
qualitatively different than traditional oral cultures, and that the correlation between 
writing and civilizations is not a coincidence. 


Further qualitative changes came from the advent of printing, and both of these 
changes are a bit puzzling since each was originally a kind of automation of what 
came before: writing of speech , and printing of writing. In both cases it was the “what 
elses?” that made the difference. The “what elses?” have to do with the “what's 
differents” that happen when one gets fluent with any kind of tool, especially those 
that carry ideas as well as actions. 


There's a lot more to all this than fits in a Quora-sized answer, but for a start, consider 
what writing and then printing mean for both description and argumentation: new 
ways of writing and reading in form, extent, organization and kind of content are now 
available, and these co-evolve with new kinds of ideas. 


In this light we can ask about what computers bring that is qualitatively new and 
important. Consider what it means to not only describe an idea, but to also be able to 
simulate it, to carry it out, to explore its implications and hidden assumptions in ways 
never possible before. JCR Licklider — who set up the original ARPA research that has 
led to today's personal computing and pervasive networking technologies, wrote in 
1960: (paraphrasing a bit) “In not too many years the relationship between humans 
and computers will think as no humans have thought before”. 


This vision was first analogized to complementary tools and vehicles, but was soon 
seen as the much larger idea of a change in types of communications and kinds of 
thought that would be as revolutionary as those brought by writing and the press. 


To understand what has happened, we only need to look at the history of writing and 
printing to note two very different consequences (a) the first, a vast change over the 
last 450 years in how the physical and social worlds are dealt with via the inventions of 
modern science and governance, and (b) that most people who read at all still mostly 
read fiction, self-help and religion books, and cookbooks, etc.* (all topics that would 
be familiar to any cave-person). 


One way to look at this is that when a new powerful medium of expression comes 
along that was not enough in our genes to be part of traditional cultures, it is 
something we need to learn how to get fluent with and use. Without the special 
learning, the new media will be mostly used to automate the old forms of thought. 
This will also have effects, especially if the new media is more efficient at what the old 
did: this can result in gluts, that act like legal drugs (as indeed are the industrial 
revolution’s ability to create sugar and fat, it can also overproduce stories, news, 
status, and new ways for oral discourse. 


On the other hand, pretty much all science and engineering is only possible with 
computers, and mostly because of the computer's ability to actively simulate ideas 
(including the “idea of thought" itself) coupled with the great extensions to what the 
printing press has already wrought. 


Einstein noted that "We cannot solve our problems with the same level of thinking 
that created them”. We can use computers to help in new ways to solve many of our 
largest problems. 


On the other hand, we will be in terrible trouble if we use computers to create new 
levels of problems for which we don't have the stronger levels of thought to avoid and 
repair. A good analogy is that “nuclear weapons are dangerous in any human hands”, 
but “nuclear weapons in the hands of cave people are much more so”. 


A great line by Vi Hart: “We must ensure that human wisdom exceeds human power". 


And we don't acquire wisdom without considerable effort, especially with children 
who are just starting to form their ideas about the world they were born into. 


* See "most read books over last 10 years in the US" e.g. from USA Today website. 


€ Alan Kay 


Have designed a few programming languages - 4y 


What is extensibility in object-oriented programming? 


Programming languages have appearances (‘syntax"), meanings ("semantics"), and 
efficiencies ("pragmatics"). A really good “extensible language” will allow each of these 
to be extended (and in large ways when this is a good idea). 


It's worth noting that the procedures and functions of an Algol-like language (C is an 
example) allow new operations to be programmed and invoked by names in ways that 
are parallel to the built-in operations. In some languages, some of the symbols can 
have generic meanings and be represented by more than one concrete meaning (for 
example, floating point arithmetic in most languages uses the same symbols as 
integer arithmetic). Some languages allow more meanings to be given by the 
programmer — for example to define complex number arithmetic. Some languages 


allow existing symbols — like "+" — to be used — “overloaded” — for this, while 
others require a different name to be used. 


A really good extensible language will allow considerable control over appearance, 
meaning, and efficiencies. From this standpoint, it's hard to find really good extensible 
languages in use today. 


It is worth pondering the additional responsibilities that come with really extensible 
languages. First, experience with these in the 60s and 70s shows that even good 
programmers tend to come up with poor extensions in the heat of programming and 
debugging. The result is a gobbledegook that is even worse than average code. 
Extensions are meta-designs and need more thought and skill than many 
programmers are willing to develop. 


On the other hand, really good designers using a language with a really good meta- 
system can be highly effective in just the opposite direction of gobbledegook. Still, 
even good extensions can impose a learning curve that may incur more human cost 
than the benefits (this is the flip side of the benefits of DSLs as well, and a good 
extensible language can make excellent DSLs). 


The best extensible languages | saw in the 60s were Ned Irons’ IMP (CACM Jan 1970) 
and Dave Fisher's CDL (CMU thesis 1970). These were both procedural, and were 
mostly concerned with appearance and meaning. In thinking about “objects” as 
“intercommunicating processes via messages” it occurred to me that if an object could 
receive a message as it saw fit, then each kind of object could have a minilanguage for 
its interface. This fit well into how Lisp 1.5 (and later) were starting to use FEXPRs 
(giving the unevaluated arguments to a function to the function so it could figure out 
what to do). Later, in the 70s, this idea — at the string level — was the basis of Doug 
Mcllroy's “pipes” programming ideas. 


| used a variant of the Lisp idea plus a few insights of Dave Fisher in the first Smalltalk 
design. The body of a “class” definition was a kind of parser that allowed a wide 
variety of forms to be devised. This worked quite well (and it also had the “Tower of 
Babel” characteristics that required some care to make the extensions be of great 
benefit rather than just creating a lot of noise). 


For deep meanings and efficiencies, some of the pioneering work was done by Brian 
Smith's "3-Lisp” reflection ideas, the later Smalltalks, and the "Meta-Object Protocol” 
ideas of Kiscales, Bobrow, Rivera, etc. that had Common Lisp as a target. The approach 
of the latter was especially nice because it allowed to compiler to see deep extensions 
and when a good idea, incorporate them at the same levels as the most critical deep 
code. Fisher had a few elements of this as well. 


A key idea here is that when “meta” is made available, really great debugging and 
graceful dealings with errors are critical. This should include “fences” that initially hold 
off meta-changes unless the programmers are really sure that they want to deal with 
these levels. 


The philosophy here is a subset of a larger Parc observation about times “when a 2nd 
order theory is the opposite of a 1st order theory, and they are both true”. For 
example: the 1st order theory about not diving down the black hole of making one’s 
own hardware and software tools is true — but the 2nd order theory “if you have the 
chops to make your own tools then you should” is also true. Parc was partly based on 
making the 2nd order theory work. 


€ Alan Kay 


Have designed a few programming languages - 4y 


Why do you need to use command prompt? 


There are a number of good end-user solutions using GUIs that lie between pure text 
command lines and highly graphical programming. 


With respect to graphical programming, it's important to note that Ivan Sutherland's 
Sketchpad — my vote for the first real interactive graphics system (1962) — was 
programmed graphically (in terms of telling the system to "figure out how to make 
this happen” — using constraints). 


Ivan Sutherland making a bridge simulation from scratch by drawing and 
programming with constraints. 


lvan's older brother Bert — after getting out of the Navy — did a PhD thesis (ca 1965) 
called “Online Graphical Specification of Procedures”, which was my vote for the first 
data flow and “wired up” components graphical programming system. 


A graphical program for square root that creates a new component (in the upper 
left quadrant) using connectable dataflow programming. 


Just a little later (ca 1968) a very usable flowchart programming system — GRAphical 
Input Language (GRAIL) — was done at RAND by Tom Ellis et al. 


This system was capable of larger scale programming as well as simple programming 
for end-users (which might be a single screen-full of wired up things to do). Each 
element can be tapped on to reveal its programmed meaning, also in terms of flow 
diagrams. This allowed them to program the entire GRAIL system in itself. For 
debugging, the diagrams can be run slowly (there was a speed control) and would 
show the actions through the diagrams step by step. 


Another constraint system that followed in the footsteps of Sketchpad, was ThingLab 
by Alan Borning (ca 1977) done in Smalltalk at Parc. 


Another early dataflow system was Fabrik, done by Dan Ingalls et al. in Smalltalk in the 
early 80s. 


An exemplary system at “industrial strength” was the graphical dataflow system 
“MetaFour" done in the late 80s by Parc alums Dave Liddle, Yogen Dalal, et al. which 
was used heavily for a number of years by large companies — e.g. Proctor and 
Gamble — initially for “data mining”, and then for many other applications. The end- 
users graphically connected “data sources” (represented as a "node’) to “data filters” 
and "data massagers” (also represented as nodes). The system was supplied with a 
large repertoire of useful nodes — some of which allowed further programming in 
various ways, including the ability to encapsulate a whole dataflow diagram system. 
This system was nicely matched to the end-users and was supplied with large CRTs 
that provided enough real-estate for the graphical programs. 


As with all programming systems, graphical programming is generally much easier to 
write than to read. Because there is less directionality in 2D or more D, reading can be 
extremely difficult (like reading electronic circuit diagrams). 


Alan Borning's ThingLab solved part of this problem by allowing dual mode operation: 
generally one programmed by graphical building, and a visitor would understand by 
moving back and forth between the construction and a textual rendering that was 
more narrative-like. This was a good solution. 


The MetaFour system's solution was also a very good one in that the problem domain 
and how it was handled plus the available hardware, display, etc., were all well 
matched up. 


The advent of wide angle VR could make a big difference. Graphical programming 
requires more real estate, and is much more effective with larger viewing angles (this 
is also true for good Uls for textual programming, just more so). My guess is that an 
attempt at a new system today by really good designers and programmmers who are 
willing to do the extra UI work needed (these are in very short supply these days), 
could produce quite a breakthrough in “more-intuitive programming”. 


However, programming is not the same as good design — either for functionality or 
appearance — so at some point computerists really need to buckle down to learn how 
to do good systems design regardless of the means of programming ... 


Alan Kay 
Still trying to learn how to think better - 4y 


2 


Which books should | read to learn about the stories and anecdotes which 
lead us to the "modern" computer science? 

I think the 20 or 30 books | have read about this are far less than half the books 
published about the history of computing. Most of the ones | have read are flimsy and 
inaccurate. The easy part of this for me to check was that the books that featured 
incidents | was a direct witness to or a direct part of are quite inaccurate. 


There are a few that are worth the effort. For example, by far the best book about the 
work that led to personal computing, the GUI, Ethernet, Internet, etc. — the work of 
the ARPA/Parc research community, etc. — is "The Dream Machine” by Mitchell 
Waldrop. There are inaccuracies in this history also, but Waldrop spent 5 years 
researching this book, and the result is a pretty good job over all. 


There are a number of technical histories written by the researchers themselves, most 
of which have been well vetted. For example, the "History of Programming Languages” 
series set up and published by the ACM gives a pretty good picture of how individual 
programming languages were invented and made. The ACM also published a “History 
of Workstations” book (and has published the videos of the public talks by many of 
the key researchers). 


7 


As for the term “computer science”, | take it literally and in its original intended 
meaning (where “science” is supposed to mean “science”). How this all came about 
would be a fascinating and worthwhile book, and there could be one that I’m not 
aware of. 


< » Alan Kay 
Still trying to learn how to think better - 4y 
What does Alan Kay think of Dynamicland? 


A terrific and beautiful world of ideas and experiences! 


CA Alan Kay 

Have designed a few programming languages - 4y 
How usable was composition between widgets in Smalltalk 80 on the Xerox 
Alto? Could full applications be developed this way? 
The answers to each question are somewhat different. First, there weren't really 
“applications” of the stove-piped variety people are used to today. Every object was 
encapsulated and could be a component of a system, and every object was viewable 
(and views were also objects (every kind of thing was an object)). 


So, if we look at an Alto screen shot of an early version of Smalltalk (ca 1974), we can 
see a superset of the kind of editor I’m using to write this answer. 


The differences are that | can both use multiple fonts and make my own font 
characters and whole fonts as needed (the little view in the lower left is a view of a 
character that lets one edit it). That would be nice to have these days. 


Each class with visible views would carry the needed editors as part of the class 
definition. 


Another example of this (and another difference) is that the illustration which is 


inserted in the galley is being sketched in place — you can see that the UI for this 
sketch object has sprouted around it, like a “halo” or “frame”. So “pictures” are also 
active entities that can be “edited”. That would also be nice to have these days. 


The way the galley editor was made was to create a galley class, whose objects were 
held in a sequence, and whose views were in the same sequence and which subtended 
the size of their visible contents (worth noting that this is the same general kind of 
thing that a string and string view are and a paragraph and paragraph view are). Any 
view of any object can be part of any composition. 


In turn, the galley editor can be used as a component of a more complex organization. 


More fun — and why we invented MVC — is the multiple viewing-editing 
architectures of complex tools. Here is the first one — done by the amazing Trygve 
Reenskaug — showing a planning system where the different views are looking at the 
same plan in different ways (this would be ca 76-77). 


We can see a critical path view at the top right, and below it a “Gantt view”. In the 
lower left is a simple list view of the plan stages objects. A selection there will select 
the same object in all the views that show it. 


The user is about to manifest another view of the same plan. Each of the views is 
independently editable and will cause the desired changes in the underlying model. 


As a final example, here is a demo in a revived Smalltalk from 1978 that gives a sense 
of what kinds of things were done back then (this is also the Smalltalk version that 
Steve Jobs saw in his famous visit to Parc). 


About 10 minutes into this there is an example of coordinating two kinds of visible 
objects (painting and animation) to make a richer tool. 


The first question was “how usable?” was this? It was pretty usable for developing 
programs and systems, and could have been much more usable for casual end-user 
combinations. This is still an interesting set of problems that have not yet been solved 
all together. Hypercard contributed some important ideas for end-user scripting 
(though it wasn't object-oriented enough to allow this kind of composing). Quite a bit 
more in some important areas was achieved by the Etoys system (done in an open 
source version of Smalltalk called “Squeak"). 


€A Alan Kay 
“7 Still trying to learn how to think better - 4y 


Why were computers like the DEC PDP-10 called minicomputers, since they 
weren't really that small? 

Originally Answered: Why were computers like the PDP-10 called minicomputers, since they weren't 
really that small? 

The PDP-10 was a “mainframe computer” and was not called a minicomputer. The 
PDP-8 was called a minicomputer, as were most of the early PDP-11s. 


One angle on the name, which to some extent correlated with physical size was the 
“word size" of the machine. Most of the commercial mainframe machines starting in 
the late 50s had word sizes of 36 bits to 48 bits. The IBM System 360 computers were 
32 bit mainframes. Etc. 


One of the earliest commercial “smaller” computers was the PDP-1 (introduced ca 
1959-60) which had an 18 bit word, and an instruction set that resembled its ancestor 
Whirlwind (on which Ken Olsen and other eventual founders of DEC had worked in the 
early 50s). I'm pretty sure that we called the PDP-1 a “mini”. 


One of my heroes — Steve Russell — the original implementer of both Lisp and 
SpaceWar — with the restored PDP-1 at the Computer History Museum 


Interestingly — and to the point of the above — the SDS-940, a small computer with a 
24-bit word and half a MIP, was referred to as a “mainframe” (it was the machine that 
ran Engelbart’s “Mother of all Demos’). 


€ Alan Kay 
“7 Still trying to learn how to think better - 4y 


How did humans discover how to make fire? 
We don't know. 


Most of the answers that speculate that fire from natural causes was carefully 
preserved long before making it was discovered are almost certainly correct. 


The American Indians did this in a number of ways, and the practice was continued in 
many cultures after humans could make fire, both for ritual reasons — the "keepers of 
the flame” — and for practical reasons — it is generally a lot easier to preserve a 
glowing coal than to make a glowing coal. 


As for discovering how to make fire, it would help most people to first learn how to 
make a fire from scratch using three or four known early methods. This will impress as 
to the number of things that have to go right to have it work when you are really 
trying to do it. 


A good one that does have most of the things present, and has a good chance of 
producing the very same kind of thing as the preserved glowing coal, is drilling holes 
in beads, bone, wood, etc. The only answerer that | noticed had mentioned this 
possibility was Mark Werner. 


There are several good YouTube videos showing both a hand drill (it's amazing that 
this works as well as it does) and a bow drill, which is my favorite likely answer to this 
question for most cultures that did discover how to make fire. Drilling something with 
a hand or bow drill has the nice property that the cast off material is finely granular 
and thus really prone to combine frictional heat with oxygen in the air to start 
combustion. The result will resemble the precious preserved glowing coals that the 
“keepers of the flame” are tending. From there the process of getting to a real fire is 
the same: tinder and blowing and then kindling. 


Alan Kay 
Still trying to learn how to think better - Updated Aug 30 


Is 60Hz not fast enough for a VR headset? 


There are a number of human vision characteristics that need to be handled really well 
to make VR a good matchup. 


(1) Foveal vs Peripheral acuity - most of the acuity of a human eye is in the small 
foveal region of about 2x3°. We are pretty much legally blind outside this region. For 
vision, our eyes saccade @ (scan) the visual field and our brain stitches this info 
together and combines it with what we believe is there to create what we call “reality” 
(which sometimes is not close). 


McDonnell-Douglas did a VR headset years ago that tracked the eye and put hi-res 
into the foveal region. If this is not done, then very high res needs to be used for the 
visual field that saccades can reach. 


(2) Foveal vs Peripheral sensitivity to brightness change — to help deal with the large 
differences between foveal and peripheral acuity, the peripheral vision is many times 
more sensitive to a brightness change. These also cause the eye to saccade to bring 
the fovea to bear on what might have caused it (this is probably a sensitivity to “attack 
from behind"). This can happen for some people about 120 Hz and above, and most 
will have some sensitivity at 90 Hz. 


And, note that the need for wide angle in VR (and in physical displays for computers, 
TV and movies) will affect the peripheral sensing much more than narrow angles. 60Hz 
was originally picked for several reasons: a big one was that it worked well enough for 
tiny TV screens across a living room. 


With wide screens and low frame rates the viewer's eyes will be saccading violently to 
the corners of the screens, and this can cause vertigo and headaches. This can be 
made extreme by also having large pans and zooms in the content, either by head 
movement or by movie/tv content (these force brightness changes in the periphery 
also. 


(3) Tracking — lags in tracking can cause vertigo, as can too low a tracking rate (they 
have some overlap). This is often more serious an effect than too low a frame rate for 
the images (and they overlap). 


(4) — Focus and Stereo — most headsets still focus on the display element in a way 
that the eye recognizes as “close up”. If the viewer wants to look at a “far away” object 
their lenses will “focus out" but this won't work. This is also a source of headaches. If 
an object “gets close” (within about “8 feet") the eyes will try to converge, and this can 
get painful if the object is presented as “really close”. 


Bottom line is that many people are affected in numerous ways by the so far poor 
human factors of VR headsets, and many are willing to put up with them. The variation 
in humans mean that some will really be affected by one factor or another to the point 
of not being able to use them. 


Addendum to (2) — added Aug 29 2023 


It's very worth noting what Jeri Ellsworth — a super creative engineering mind — did 
with the Tilt 5 HMDs (look up “Tilt 5”). She wanted to be able to run her unique table- 
top "corner-reflective” display from personal computers that might only have 60fps. 
She knew this would not be enough, so she designed and made a special 
interpolating chip that takes in 60fps and outputs 180fps, which, for most cases, will 
provide a much more realistic input to the human eye. The wider the angle of view, 
the more the frame rate needs to be attended to (this was partially advanced for wide 
screen movies by using a spinning shutter that shows each frame of film 3 times — 
but without interpolation (which would help)). 


@ 


Why do people say that Ada Lovelace was the first programmer, when 
Babbage wrote the first programs for his own machine? 


Alan Kay 
Have designed a few programming languages - Updated 4y 


Goethe said “We should all share in the excitement of discovery without vain attempts 
to claim priority”. He meant not just “early” but any time an important idea is realized 
by a human being. 


The pop cultures — and to some extent the more developed cultures — try to create 
interest through hype, and “being first” or “claiming first” is one of many ploys. 


What we should all celebrate is that both Babbage and Ada were able to realize and 
understand a number of important ideas that most people today still don't 
understand. We should celebrate Ada’s abilities to explain, to use mathematics and to 
write programs regardless of whether Babbage was also a mathematician and could 
write programs. 


We should mourn that neither were able to get more understanding and support for 
these ideas. 


We should regret that Ada died so early. And that Babbage was too stubborn to 
change his initial ideas about how to make a working computer (there were far better 
routes available to him in the last 20 years of his life). 


And so forth. 


And | don't think we should worry much about “what people say” (they say lots of 
things for their convenience ...) 


€ Alan Kay 


Have designed a few programming languages - 4y 


What does Alan Kay think about Joe Armstrong claiming that "Erlang might 
be the only object-oriented language" and also his thesis supervisor's claim 
that “Erlang is extremely object oriented”? 


| love Joe Armstrong — we lost a great man when he recently left us. 


And, he might be right. Erlang is much closer to the original ideas | had about 
“objects” and how to use them. 


However, another way to look at this is to consider “What Is Actually Needed” (WIAN) 
and realize that much more is needed beyond what we are programming in today. 


Joe would most definitely be more in favor of this idea than worrying about what 
either one of us did decades ago. 


€ Alan Kay 


Still trying to learn how to think better - 4y 


Who are the most influential people in the history of computing? 


| think there are too many for a fair account in a Quora sized answer, especially if we 
give credit to insights regardless of whether they were followed up. 


I think of “computing” as processes that help one deal with ideas. And this covers a lot 
of ground (including various forms of mathematics, etc.) So: computers are things that 
manifest processes that help one deal with ideas. 


We don't know who invented the abacus — probably parallel inventions in a number 
of cultures — or the later Antikythera device. 


One way to do this is to emulate Physics and pick just a few names that would be on 
any list, and apologize profusely in advance for inadvertent omissions. 


This is my personal pantheon of heroes with regard to written store program digital 
computers. 


Boole made a difference because his ideas were picked up and used by Shannon to 
create a “math” and a “methodology” for designing complex switching logics and 


circuits. (1 love what Charles Sanders Peirce did with Boole’s book — including 
coming up with NAND — but the results were buried in his notebooks and missed 
being noticed and influential.) 


Turing made a difference by creating profound intellectual foundations for stored 
program computing (he didn't invent the idea, but he did invent and explain the 
cosmic scope of the idea). 


| love what Konrad Zuse did, but he was not in any direct historical line. Similarly: 
Stibitz. 


Von Neumann made a difference by lending his reputation and interests to making 
real stored program computing devices (much of the contribution here was due to 
others, but JvN was a key factor). 


Leaving out the secret Bletchley Park work, we have Max Newman, Williams, et al at 
Manchester and Maurice Wilkes et al at Cambridge actually getting stored program 
computers working within a few months of each other in 1949. 


This era is crowded with early starts, and includes Eckert and Mauchly with Univac | 
(ENIAC doesn't count as a stored program computer), and a number of university 
computing builds all based on the “Johnniac” designs (e.g. Illiac |, etc.) 


Jay Forrester and Bob Everett created Whirlwind ca 1950, one of the first parallel 
(and fast) computers for real-time interaction, plus displays with a pointing device (the 
light gun). Core memory was invented on this machine and was an enormous 
breakthrough in practical computing. Laning and Zwerler did the first higher level 
language — George — on Whirlwind, and this was also real-time and interactive. 


The first higher level language | liked was Algol 58, which was partly done as a 
reaction to both Fortran (John Backus) and the precursor of COBOL (Flowmatic by 
Grace Hopper). Prominent names associated with Algol 58 were Backus, Fritz Bauer, 
Alan Perlis, etc. 


John McCarthy for Lisp and the Advice Taker (late 50s). 
Bob Barton for the B5000 architecture (ca 1961) 
Ivan Sutherland for Sketchpad (ca 1962). 


I'll pick there to stop, both on general principles, and also because | started ca 1961, 
and “history” to me was before that time .... 


And also because things get really complicated from the late 50s onwards ... The ACM 
Turing Award (started around 1965) commemorated some of the heroes, but missed 
many more. 


Violating the above good idea for stopping, | think | must mention Margaret 
Hamilton © for truly singular important contributions in the period 1961 through 
1969 who wound up as the eventual leader of the programming effort for the Apollo 
Guidance Computer, and whose revolutionary “real software engineering” designs 
saved the Apollo 11 moon landing. She has been recognized with the NASA Prize, the 
National Medal of Freedom, and a Computer History Museum Fellow, but these still 
seem a bit short of what is deserved 


é Alan Kay 
7 Still trying to learn how to think better - 4y 


Is traditional engineering becoming a pop culture due to the modern 
practice of software development? 


| don't know, and | suspect not in the way that computing has been going. 


Much of traditional engineering is life threatening unless done carefully and 
scrupulously, and most engineers really take their calling seriously. 


For example, | think that most engineers are just sick about the 737 Max horrible 
doings, and the full story is likely to be quite ugly when all is revealed. Certainly, 
almost no traditional engineers would ever cut corners, create single points of failure, 
omit user training, and all the rest. 


Another part of traditional engineering is that much of it deals with the physical world 
in which materials cannot stand up to strong forces — so care and craft have to 
happen. In the computer, the “forces” are weak, and the gossamer cobwebs can take 
quite a while to crash — and most crashes in the past have not had human lives at 
stake. 


Looking into the future, more and more software is starting to affect human safety, 
and more traditional engineering care will be needed. | don't see that “computing” has 
much of a tradition of being careful in this way, and certainly the systems building 
tools and methods are not very good at helping “real engineering practices”. 


€ Alan Kay 
“1 Still trying to learn how to think better - 4y 


Is Archibald Putt a creation of Alan Kay? 
Not this Alan Kay ... 


What are Alan Kay's views on automation and technological 
unemployment? 


Alan Kay 
Still trying to learn how to think better - 4y 


I think the issues revolve around a society's conception of itself, its main societal goals, 
and the tradeoffs between our genetic heritage, the traditional cultural views of 
“reality” and mores, and the more recent inventions that are more distinct from the 
past (modern science, ideas about equality and human dignity, etc.). 


“Ethics” and “morality” can use human sciences to help reason, but in the end they are 
more like math in that they are generally formed by choosing basic principles and 
then using reason to ramify out ideas about good behavior. 


For the current question, we could imagine choosing as one of the basic principles of 
a society, the idea and goal of human dignity (this was a frequent and useful theme of 
Martin Luther King). 


We can look at the variation of abilities, talents, motivations, circumstances, etc. in a 
population and ask how can a society permit generally useful changes while 
preserving its values? (I'm picking “dignity” here as a stand in for all of them). 


So we need to ask about the general propensities and abilities of adults to 
qualitatively change their behaviors in mid-life, and ask about the abilities of schools 
to change how they are trying to shape and help children to become adults (and no 
small part of this requires the adults in schooling to make similar large changes as 
above). 


To oversimplify, we can think of societies as having something like a viscosity that 
damps down accelerated changes in attitude. Whereas science and technology has 
much less "viscosity": these are mainly based on knowledge, method, and technique 
— they were quite opaque just a few hundreds of years ago, but this is because how 
was invisible. Now that "how" has been unveiled, it is relatively easy to make progress. 


We are in the middle of mismatches between older rhythms of living — including 
ideas of cooperation and competition — and the new methods. The mismatches do 
and will cause great disruption and unhappiness for some large parts of a society, so 
the question could be “should we try to smooth out unhappiness by distributing it 
more evenly, or should we allocate it just to one sector so that everyone else shall be 
‘happier’ ?" 


Rampant capitalism allocates happiness in one direction, socialism and communism 
allocate differently. From the standpoint of systems design, none of these past 
systems have been set up well — in part because they ignore the larger factors 
mentioned in the beginning here. 


To bring this back to our current situation, systemic change for the majority of human 
beings is mostly generational, and requires generational changes in the education 
system. If dignity is one of the values that need to be preserved, then the society has 
to make plans and allocate resources to head off simple but catastrophic disruptions 
that simple but catastrophic competition and US versions of capitalism encourage. 


The current drivers of society almost completely miss that civilization and progress 
were set up in very large part to transcend human genetics — in other words, we 
humans want and need much more than “survival of the fittest”. We don’t want bad 
eyesight to doom a person — we invent eyeglasses. Similarly we don’t want “bad 
eyesight” to doom a society — we have started to invent better ways to organize 
societies and to give us “more sight” through science, deeper thought, and more 
careful planning and simulation, 


We have invented larger values than “survival of the fittest” and we need to find the 
means and the will to make them flourish in a richer conception of society. 


é Alan Kay 
<1 Still trying to learn how to think better - 4y 
How and why did Xerox fail to capitalise on PARC's inventions? 


There are good enough answers under this Quora link (this is an example of a 
question that has already been asked and answered). 


Why couldn't the management of Xerox see the goldmine that they had with 
inventions at Xerox PARC? Was it really about lacking money to capitalize? How about 
stock issuance? 


€ Alan Kay 
*— 1 Lam the Alan Kay in question. - Updated 4y 


What does Alan Kay think of the Tablon and similar portable practice 


devices fl carry mine with me almost everywhere)? 


About 6 or 7 years ago my left thumb base joint gave up the ghost (arthritis, bone 
spurs, loss of cartilage, etc) and | had to give up guitar. About a year ago | heard a 
wonderful lute player — Thomas Dunford — 


and was so taken with his playing that | asked the Lute Society in the UK about 
thumb-bracing on lutes and was told that it is much less than on a guitar. 


| eventually wound up with a really great instrument — a 7 course Renaissance style 
lute of the kind that was starting to be played around 450 years ago as the 
Shakespearean age and the golden age of English lute music was starting. 


It is as light as a feather and very portable compared to a guitar, so | have been taking 
it with me on my travels. I've been able to avoid re-inflaming my thumb joint by 
paying constant attention to how it is feeling (it is always signaling) and stopping to 
give it a rest many times each day. This is working pretty well, and it is a wonderful 
experience to get back to an instrument that is more intimate and expressive than 
keyboard instruments after thinking | would never be able to play these again. 


It is surprisingly different to learn to play compared to a classical guitar (though there 
is some transfer). The right hand and arm are oriented so the fingers point up along 
the strings (rather than across from the top as in a guitar). The playing is with 
fingertips rather than with nails, and one is playing on double strings (except for the 
top string). The range of tone and nuance (and difficulty) seems to be somewhere in 
between a guitar and a violin (one feels that one’s fingers are each a little bow). The 
direct touch of fingertips on both hands is extremely intimate and pleasurable. 


The frets are made of gut, and tied on. They are much lower than guitar frets and this 
requires quite a bit of relearning of the left hand — in many cases, especially for 
chords, you have to get your fingertips closer to the frets to get a clear good tone. 
Not quite as demanding as a violin with no frets but a lot more demanding that 
guitar. A super benefit of the gut frets is that they can be moved to get pesky intervals 
— like 3rds and 10ths into great tune (the poor tunablilty of guitars which are set up 
for equal tempered intervals has always bugged me). On the lute you can adjust — 
slant even — a fret or two to make the sound so much nicer. 


| should say a word about lute timbre and “presence”. This has been the largest 
revelation and surprise. If you play 3 or 4 part harmony on a piano the sound is “thin” 
compared to 4 brass instruments playing the same notes. Guitar is “pretty darn thin” 
compared to most orchestral instruments. 


The lute is not very loud and doesn't have a lot of sustain, but it is amazingly “not 


thin". There is something about the timbre that is “memorable” so that a line that is 
broken by notes dying away or by hand movements, can still be sustained in the 
listener's mind because of the continuity of “presence” in the timbre (you can hear this 
very distinctly in Dunford’s playing). Sparce voicings that would not sound great on 
guitar sound very good on the lute. | find this quite magical as I'm gradually learning 
to play some of these pieces. 


Some lute pieces demand quite a bit of technique, but a Tablon for the lute wouldn't 
help what the right hand has to learn. But I've found that the light weight and small 
size of this style of lute is very amenable to traveling when | have to. 


€ Alan Kay 

<1 Have designed a few programming languages - 4y 

How can | help develop a wiki-Pharo? (Many people using and editing the 
same Pharo environment image in real time.) 


| suggest you not worry about Pharo — or any other programming environment — 
but do ponder scaling, “number of cooks", dependency, and coordination issues. 


One of the ways to do this is to look at complex structures, both in nature (Biology) 
and human-made (cities, countries), to puzzle out how and why the scalings worked. 


Cities might be a good place to start. How does “independence of thought" trade-off 


won 


with “coherent esthetics”, “needed services", etc.? 


What degrees of freedom should be allowed for different levels of things and 
processes? For example, can anyone in a city just decide on a whim to change the 
voltage in electric power services? Can a city law require all houses to look the same? 
Can a city law require certain styles to be followed? Or: can a city law require that no 
locks be allowed on private dwellings? Or no privacy ... etc. 


Most programming languages have ways to request services. A key issue is what 
services can be depended on. The ones made locally are under local control — and 
can still have bugs or be "improved" in ways that were not assumed by the service 
user — but these are relatively easy to deal with. But how about services that are 
supplied by the environment. How does one know that one’s assumptions about the 
services are accurate enough to allow use — and how can one re-vet when a global 
service has a new version? 


It is pretty annoying that in 2019, there is not a typing/interface scheme that is 
semantic enough to deal with the coordination of remote services and dependencies. 


If | were going to try to do what your question asks, | would aim at the two big 
problems of (a) “governance” and (b) assumptions and dependencies. 


Then | would make a programming system and environment that fit these needs. The 
methods used to invent Smalltalk (Pharo is a version of Smalltalk) at Xerox Parc will 
work well for creating What Is Actually Needed (WIAN). | have outlined some of the 
first steps above ... 


/ -\) Alan Kay 
<1 Have designed a few programming languages : Updated 4y 


Rich Hickey, creator of Clojure has said that object-oriented programming is 
unable to model “time” properly. What does "modeling time" actually 
mean? 

Originally Answered: Rich Hickey, creator of Clojure has said that object-oriented Programming is 
unable to model “time” properly. What does "modeling time” actually mean? 

Most languages don't model time — and worse, let the CPU determine what time 
means. 


But the need for it, and a variety of solutions have been around for at least 60 years 
now. 


Part of John McCarthy's long ago insight (that after 60 years | wish computer people 
had grokked by now) is that you just have to model time along with everything else 
that is modeled. Then you wind up with parallel histories that both have the memory 
of things that we want and need and we also have the memories of when. This renders 
“mutable” and “immutable” moot ("mute?”) and besides the point. 


John's desire to have both unambiguous reasoning about situations and actions and 
causes and to have changes over time led to his powerful ways to deal with these. | 
think most of the confusions today stem from too limited a view on computing — 
very likely from taking programming practices as “computing” (whereas they are 
distressingly distant from each other at the present time). 


I'll get back to this in a bit. It's worth noting that forms of dealing sensibly with time as 
histories have existed in the database world for many years, in the forms of versioning, 
atomic transactions, etc. The programming language Lucid had all variables retain 
histories, etc. 


All of these allow one to picture the progress of a whole system as the world lines of 


what it uses for state changes. Could be “data”, could be “objects”. 


A key notion and issue that arises is to ask questions about “consistent states” in the 
history (McCarthy did). If we think of the world lines as progressing upwards, then 
“consistent states” are horizontal “layers” or “surfaces” (John called them “situations” 
— Al sometimes called them “frames"). The idea is that we want to know the 
pseudotimes when the whole system can be considered consistent enough to allow 
reasoning. The progression of “situations” represents the transitions in what is 
essentially a phase space. From John’s point of view, the identity of a situation allows 
logical reasoning to be done consistently. 


Note: the data base world has very ad hoc to no solutions to “situations” across data 
bases (but it’s clear that deductions drawn from data bases with different dates and 
versions might not be reasoning within a situation). 


My long ago approach to “objects” was partly inspired by Sketchpad and Simula |, 
both of which had what could be called a “simulation approach" to modeling things. 
(We can see that this is quite compatible with what John was trying to do with fluents 
and his robot programming and reasoning.) If you have a simulation approach and 
use fluents and situations, then you can think about getting your objects to the next 
consistent situation by letting them use the current situation as inputs to what is 
essentially a kind of functional programming (or logical reasoning) to 
“deduce/produce” the next consistent situation. 


We really wanted to do this at Xerox Parc but our computers were much too small 
even for relatively short histories to be retained. However, various systems have done 
this since. For example the Croquet system done by Viewpoints about 15 years ago. 


And those who are interested in this idea should definitely take a look at Leslie 
Lamport's many papers on these issues, and David Reed's 1978 MIT Thesis on “NETOS 
— an operating system for the Internet”. And at David Jefferson’s work. Etc. 


When one starts looking at things from this point of view, it should become apparent 
that object techniques (using encapsulation, messaging, reflection, etc.) are well suited 
to being upgraded to having a full model of “time”. 


€ Alan Kay 

<7 Lam the Alan Kay in question. : 4y 

How can | learn more about the original spirit of computing (as alluded to 
by Alan Kay, et al.)? 

I take “the original spirit of computing” to mean “for the fun and the art of it", and the 


“art” part meaning to create better works of art at all levels, especially the 
foundational levels. 


If we take “computing” as “manipulating information to satisfy goals”, then it's been 
around for quite a long time, both for “fun and art” and for “purposes less fun” (for 
example for business and the military). 


So, | think of “the original spirit” as always coexisting with much more pragmatic 
applications. And | think it is still around in various forms. An interesting question 
would be whether the percentage of people doing “original spirit" stuff is smaller than 
in times past (I'm guessing yes). 


Another question is can those who want to do “original spirit stuff" find avenues for 
this? 


| think the answer is generally “yes”, and also in many ways “easier than in the past”. 
This is because the casual amount of computing capacity available to everyone ina 
laptop, etc. is enough to do almost any kind of prototype. Just for one example, take a 
look at Anselm Eichoff's “Citybound" simulation/game system programmed by him 
(via making a new kind of object system and then building millions of concurrent 
objects on that, and all running on a laptop). 


This is enough local capacity to prototype ideas of Internet scale. 


For HW assist needs, it is possible to buy (for about the cost of another laptop) FPGA 
accelerators — which can plug into your laptop — on which supercomputers can be 
defined to help one’s software needs). 


The tradeoffs in the past were (a) tiny slow hard to build expensive machines vs. (b) 
great exploratory research funding from ARPA, ONR, etc. 


That great funding helped to bring the vast resources of today but the great funding 
for small research teams is not present at this time. 


But if you want to see “The spirit of computing” alive and well today, take a look at 
Anselm above ... 


é Alan Kay 
<1 Still trying to learn how to think better - Updated 4y 


Experienced programmers and computer scientists, what are some really 
old (or even nearly forgotten) books you think every new programmer 
should read? 

Originally Answered: Experienced programmers and computer scientists, What are some really old 
(or even nearly forgotten) books you think every new programmer should read? 

| love that "2006" and "2008" (in another answer) must be considered “really old” 
(which is what the question requests) ... 


I'm still a big fan of the “Lisp 1.5 Programmers Manual” (MIT Press — still in print). 
This version of the language is no longer with us, but the book — first written ca 1962 
— by John McCarthy, who invented, and his colleagues, who implemented, is a perfect 
classic. 


It starts with a version of John’s first papers about Lisp, and develops the ideas in a 
few pages of examples to culminate on page 13 with Lisp eval and apply defined in 
itself. There are many other thought provoking ideas and examples throughout the 
rest of the book. 


The way to grow from this book is to deeply learn what they did and how they did it, 
and then try to rewrite page 13 in a number of ways. How nicely can this be written in 
“a lisp” using recursion. How nicely can this be written without recursion? (In both 
cases, look ahead in the book to see that Lisp 1.5 had gotten to the idea of EXPRs and 
FEXPRs (functions which don't eval their arguments before the call — thus they can be 
used to replace all the “special forms” — do a Lisp made from FEXPRs and get the rest 
by definition, etc.). 


What is a neat bootstrapping path? How could you combine this with Val Shorre’s 
“Meta II" programmatic parser to make a really extensible language? What does it 
take to get to “objects”? What are three or four really interesting (and different ways) 
to think about objects here? (Hints: how many different ways can you define “closures’ 
in a language that executes? What about using Lisp atoms as a model for objects? 
Etc.) 


" 


The idea is that Lisp is not just a language but a really deep “building material” that is 
tidy enough to “think with” not just make things (it's a “building material” for thoughts 
as well as computer processes). 


Dani Richard reminded me to mention: “Computation: Finite and Infinite 
Machines” by Marvin Minsky (Prentice-Hall, 1967), which — since it is one of my 
favorite books of all time — I'm surprised | didn’t include in the original list. Marvin 
could really write, and in this book he is at his best. It is actually a “math book” — with 
lots of ideas, theorems, proofs, etc., — but presented in the friendliest way imaginable 
by a great mind who treated everyone — including children — as equal to him, and as 
fellow appreciators of great ideas. There are lots of interesting things to ponder in this 
book, but perhaps it is the approach that beckons to the reader to start thinking “like 
this” that is the most rewarding. 


“Advances in Programming and Non-Numerical Computation” (Ed. L. Fox) 
mid-60s. The papers presented at a 1963 summer workshop in the UK. The most 
provocative ones were by Christopher Strachey and several by Peter Landin. This was 
one of the books that Bob Barton had us read in his famous advanced systems design 
class in 1967. 


Try “The Mythical Man-Month” by Fred Brooks, for an early look and experience 
with timeless truths (and gotchas) from systems building with teams ... 


Try “The Sciences of the Artificial” by Herb Simon. A much stronger way to think 
about computing — and what “Computer Science” might mean — by a much stronger 
thinker than most today. 


“A Programming Language” by Ken Iverson (ca 1962). This has the same thought 
expanding properties of Lisp. And, like Lisp, the way to learn from these really old 
ideas is to concentrate on what is unique and powerful in the approach (we know how 
to better improve both Lisp and APL today, but the deep essence is perhaps easier to 
grasp in the original manifestations of the ideas). Another book that Barton had us 
read. 


| like Dave Fisher's 1970 CMU Thesis — “Control Structures for Programming 
Languages” — most especially the first 100 pages. Still a real gem for helping to think 
about design and implementations. 


More recent: (80s) “The Meta-Object Protocol” by Kiczales, et al. The first section 
and example is a must to read and understand. 


Joe Armstrong's PhD thesis — after many years of valuable experience with Erlang 
— was published as a book ca 2003 ... 


Lots more out there for curious minds .... 


€ Alan Kay 
<< Lam the Alan Kay in question. - 4y 


What does Alan Kay think about tension between sharing research methods 
and findings freely and early & preserving commercial viability so the 
findings can be made into products that change the world? 

What do you think about the Internet? Etc.? 


The question doesn’t really make sense as stated because sharing the best ideas freely 
creates the greatest opportunity for conversion into products, whether world 
changing or otherwise ... 


le. what “tension”? 


The really big ideas — like those from the ARPA/Parc research community — can't be 
reasonably developed by any single organization. 


Really big ideas create new industries for the whole world. 


Most business people and politicians do not understand this at all. 


€& Alan Kay 

‘1 Still trying to learn how to think better - Updated 4y 

American computer pioneer Alan Kay's concept, the Dynabook, was 
published in 1972. How come Steve Jobs and Apple iPad get the credit for 
tablet invention? 


Originally Answered: American computer Alan Kay's concept, the Dynabook, was published in 1972. 
How come Steve Jobs and Apple iPad get the credit for tablet invention? 


The Dynabook idea happened in 1968. 


The First Dynabook idea — Fall 1968 
Alan Kay 


children as full- 
fledged users keyboard removable 
end-user mass memory 
collaborative programmable 


But the simple part of the idea — a personal computer on the back of a flat screen 
display with a stylus and touch sensitivity — is hard to consider a real invention given: 


¢ Flat-screen displays. | saw the first University of Illinois one in June 1968 
(Bitzer and Slottow), and | knew about the work at both RCA and 
Westinghouse. 


¢ The RAND Tablet was invented in 1964 by Tom Ellis et al. 


¢ The Flex Machine (self portrait below) was started in 1967 (Cheadle and Kay). 
It had a calligraphic display, but we made a stylus-tablet for it. 


¢ Moore's Law (first described by Gordon Moore in 1965). 


The U of Illinois’ purpose for their flat-screen display was for a non-portable time- 
sharing system terminal. The flat-screen “newspaper reader” described by Arthur C. 
Clarke in "2001: A Space Odyssey (%” was just for consumption. And flat-screen TVs 
“soon” had been a staple of Popular Science for many years. These were not all that 
interesting. But thinking of using a flat-screen display for a really portable personal 
computer system was much more so. 


| knew how many transistors were in the Flex Machine, so the question was “When 
could we get these on the back of a larger enough flat-panel display?”. This was a 
lively topic of conversation at the first ARPA grad students meeting (1968, U of Illinois) 
after seeing the little plasma panel demo. 


The answer we calculated was — if the US were to really try — about 10 years, but 
very doable, especially if a solution could be found for portable low power 2nd level 
memory. 


But the actual “Dynabook" idea came a few months later in the Fall of 1968 after | had 
visited Seymour Papert and Cynthia Solomon's first LOGO classrooms. This changed 
my view of computers and personal computers from just “vehicles” and “tools” to 
“meta-media” and " for children in important ways also”. 


wou 


In other words, what got added to the simple idea was “cosmic purpose”, “service”, 
“curricula”, etc. The big hit from Papert and Solomon was that careful use and design 
of interactive computing could make a qualitative difference in the higher-level 
shaping of children’s thinking — not just learning important things earlier, but in 


taking on a much more powerful “epistemological stance” towards the world they 
were growing up in — to the point where they should be able to think much better 
than most adults do today (not a big feat, but desperately needed), and be stronger 
shapers of the future (really desperately needed). 


The context here was both the general woes of humanity, and also the sickeningly bad 
dynamics of the 60s, including the enormously daunting problems of race, the 
Vietnam war, and warnings about the environment. 


Our project at Parc was built around trying to understand how to do the “larger 
purposes” of a Dynabook for what was inevitable technologically in the future. The 
first name for the Alto at Parc was “The Interim Dynabook". Part of the goal of the 
project was to make at least 100 (we made about 1500) personal computers that were 
powerful enough to allow some of what a Dynabook would need to supply as 
“services” to fulfill the ideal. In the process we worked with hundreds of children and 
made many systems and user interfaces to find better matchups between them and 
the new medium. 


7 


Steve Jobs was very taken by the Dynabook idea, and wanted to do one when | joined 
him at Apple in 1984 — his rallying cries then were “Wheels for the Mind!”, and “The 
kids can’t wait!”. 


Many years later, the iPad was good at the simple media consumption convenience 
part of the idea, but fell quite flat in many ways with regard to the two larger ideas of 
(a) user authoring of meta-media, and (b) an environment to help children learn 
powerful ideas by making and sharing them. 


The latter was not in Apple's goals at all: 


users (even children) were forbidden to make actively programmable things 
on the iPad and share them on the Internet 


there was no stylus for drawing with the machine, and years later when one 
finally was added they omitted a place to store it (!) — and even still they did 
not include proximity sensing (compare with the Wacom Cintiq). 


fluent typing was not addressed in the design (we knew in 1968 that even 
with a touch-screen, a physical keyboard would almost certainly be needed). 
It is somewhat bizarre (even in this bizarre world) that Microsoft took more 
into account in its physical designs for the Surface ... 


most importantly, Apple made no effort to help children by funding 
curriculum development, how to help them learn, etc. Steve's earlier “Wheels 
for the Mind” ideal was long gone: buried in simple consumerism. 


So: don't confuse the Dynabook idea with the physical resemblance to the iPad. The 
latter has thousands of times the capacity of what | had in mind, but its conception is 
thousands of times more meager. 


Today the goals for this should be much larger than those | had more than 50 years 


ago. 


Alan Kay 
Still trying to learn how to think better - 4y 


Who first came with the concept of Computer Mouse, XEROX or Apple? 


Pointing Devices and Graphical User Interfaces (a partial history) 


There was an early electrically sensed joystick in the 20s — partly invented for 
“remote-control airplanes”. 


1926 Joystick Remote Control Circuit Patent 


NRL‘ C.B. Mirick patented his system for remote radio control of pilotless aircraft (or of 
“any circuits or objects which are to be controlled at a distance") using a “joy stick’ 
identical with the joy stick’ provided in aircraft.” U.S. Patent no. 1,597,416, “Electrical 
Distant-Control System,” Aug. 24, 1926. 


“My invention relates broadly to distant control systems, and more particularly to a 
control circuit for a radio transmitter, and a selectively responsive circuit for a radio 
apparatus whereby particular controls may be caused to function individually or 
simultaneously.” 


A track-ball appeared right after WWII. 


The “light-gun" was invented by Bob Everett for Whirlwind in the very early 50s, used 
in the SAGE air defense system in the mid-50s on (which later also used track-balls), 
and memorably used by Ivan Sutherland in his iconic (literally and figuratively) 
Sketchpad system ca 1962. 


MIT Whirlwind Light Gun ca 1951-2 


Doug Engelbart and Bill English invented the mouse as an upside-down track ball in 
1964. 


The Engelbart-English mouse ca 1964 


Also in 1964 appeared the really amazing RAND tablet, by Tom Ellis, et al. 
& 
yr 


Tom Ellis and the RAND Tablet ca 1964-5 


Larry Roberts did the Lincoln Wand in 1965 — it was a very early 3D pointing device 


Paul Rover (who did much of the “wand” SW) pointing at an icon in space 


In the last half of the 60s all of these devices were in some use for interactive graphics 
systems, especially in the ARPA research community. 


GUIs 


The earliest graphical user interfaces came from the ability — using a pointing device 
— to indicate something of interest on a screen. It is hard to pin down a first here — 
could have been a radar system plus physical buttons. The Whirlwind to SAGE 
progression in the early 50s used displays to show both radar readings, coastlines, 
airplane codes, and sometimes text commands that could be pointed at. The nature of 
the slow calligraphic displays, flicker, etc., tended to limit the use of the pointing 
device to where only it could really serve, and relegated most other inputs to physical 
buttons and dials. 


Ivan Sutherland's Sketchpad system was done in 1962 on the SAGE test computer (the 
Lincoln Labs TX-2), used a light-pen (a light gun without the handle), a point plotting 
display, a lot of computing capacity, and a breakthrough set of ideas, designs and 
implementations including: 


rubber band drawing of lines 


automatic “attractive gravity between end-points” 


constraints for programming — represented as icons 


“masters and instances” (objects) 


and many more of the earliest ideas for graphical interaction 


Sketchpad cannot be praised too highly. 


——] 
om 
am 


Ivan Sutherland making a bridge simulation in Sketchpad. He is holding the light 
pen. Other input devices are buttons to his left, and rotating dials in front of him 


Using Sketchpad’s light pen to draw a rivet 


At RAND, Tom Ellis et al were starting to make one of the great graphical user 
interfaces of the 60s (and all time). First, Gabe Groner used the RAND tablet to make 
the first really good gesture recognizers (ca 1965-6). This was used to make GRAphical 
Input Language (GRAIL). 


There is a good demo video of this online. This system first introduced many of the 
GUI concepts we are used to (and a few we aren't), including sliders, various kinds of 
iconic buttons, resizing and movement sensitive areas, intelligent drawing using the 
recognizers, etc. The boxes were hyperlinks to their interiors (which were also 
diagrams etc.). The diagrams could be simulated at various speeds, etc. 


Another system that can’t be too highly praised. 
And yet another: 


The Engelbart NLS system that had the famous demo in 1968 was part of a larger 
overall concept with a strong balance towards organized text with a little graphics, 
hyperlinks, “panes” (vertically tiled windows), and an number of deeply important 
ideas. 


Bill English and the NLS television display (from a calligraphic display) and the 
Herman Miller designed lap-chair with keypad, keyboard and mouse 


Detail of the user interaction controls 


This system has been poorly treated by journalists, who want to see in it precursors to 
what we have today. Instead most of the many important properties of these ideas 
have been quite missed today. 


a Education ~\ 


Humans Methods 


<-> 
Artifacts Languages 


The Engelbart conception was a complex and important system made from 
highly intertwined subsystems that included humans, education, methods, 
languages, and tools. 


Finally we come to Xerox Parc — also part of the ARPA research community — that 
got its first critical mass of researchers in early 1971. We were fortunate to have “all of 
the above” for context, plus some of the researchers above themselves (and the 
community itself was quite friendly). 


We thought we could get a better interactive coupling with “personal computers” than 
could be had from time-sharing, so our effort was to see if we could make a one- 
machine-for-each-person system linked together with a network that could be 
comparable to the single user super computers used by Sketchpad and GRAIL. We 
decided to spend enough money and design effort to “work in the future” — to make 
personal computers for research that would be generally affordable via Moore's Law 
in the mid to late 80s. 


One of the very needed technologies was a computer display with the degrees of 
freedom of active paper. And so forth (I won't recount the history of this effort here). 


The Alto: showing the Engelbart 5 finger chord keyboard, the Parc digital mouse, 
the “supercomputer” below, a disk pack, the Ethernet “stinger” (upper right). 


—_—< 


+ Pertwee 4 


2 OE a errrtet rer ot 


wed FI e.g eRe lives Ee haeme 
L JR c > 
2 aie | a nooo: ae 
re 
’ a 


Be 7 Ds 


Full page display on Alto screen showing “printing-like multiple fonts”, half-tone 
painting and animation, multiple overlapping windows, icons, user- 
programming, user font making, etc. It was this screen and content that Steve 
Jobs saw when he visited in 1979. 


awed 
by Manan Goldeen 

Mp meee we Marte Ltiee fe oe gh grade Geet of — ee re — 
terkne bee Lag Tete om Puke Attn Calibre comm | mil Mier - —~= i 
Em ne meme | et dated mentary Ub mmpeters a fore ee _ 
ie 

> eee “ = ae a 
ee Ie Se ee Oe we RRA ty omer ee | eee we eee ree 


An essay by a 14 year old girl about her programming experiences with Smalltalk 
that includes live coding in the bottom left, and a picture of the drawing system 
that she invented and programmed. 


€ Alan Kay 
Still trying to learn how to think better - 4y 


In retrospect, what were some funny idiosyncrasies of older computers? 


Besides crashing randomly? 


One that was funny in retrospect came in 1965 when | and others from the “future 
CDC 6600 community” — Livermore Labs, Los Alamos, National Center for 
Atmospheric Research, etc. — were called to CDC's Chippewa Falls labs to help debug 
the supercomputer and cobble together enough software to make it worthwhile (this 
is another somewhat unusual story). 


There were a number of causes of rather frequent hardware crashes of computers in 
those days. The 6600 seemed to have those plus a few more. 


Looking from the top, the machine was the shape of a plus sign. You could open the 
panels and the logic was on doors that would swing out — the logic-doors were 
refrigerated racks of components with Freon tubes running through them. 


The console — called a “Gazork" (a mythical beast) — was simply two CRTs (this 
required some of the machine to work in order to use). 


If you look closely at the wires on the doors, you can see that they are often long and 
of non-uniform length — this is because they tuned the timings of the machine by 
clipping these wires with a nail clipper! 


And many of the wires had little tags on them saying what the exact length should be 
when replaced. 


Further, the wires were also used as delay lines for the “parallel arithmetic units" of the 
machine — e.g. there was one multiplier unit but it was duplexed by using the wires as 
buffers for the partial products (it is shown as two multiplier units below). 


And so forth. Add to this that the machine had 11 different processors of two types, 
no interrupt system, no core memory parity test, the 1 microsecond memory had 32 
banks (the low order 5 bits of an address) and you had to have these always cycling 
100 nanoseconds apart in order to match up to the 10 MIPs processor, this by 
squeezing a few instructions into the tiny cache, etc., and you have more than a 
handful of “stuff” to get anything reasonable to happen. 


Central Mernory Characteristics 


© Wiese) cep amord a)! tng a als embepweive hands df ME weed: 9 meee pemntiny end pies amy of tnd 2 teal 


—— _ 


ierdha @ CONTE *SOCEZEOPD festie, seece nae 


Pager 1.8 Bhat [hegre af Olt Sy en 


Rough diagram of the 6600. Of note are the independent parallel “central” CPU 
abilities on the right, and the large “memory machine” on the left with the 10 
“peripheral processor” machines. 


The oddest phase was when some of the random crashes seemed to happen when 
there were more than 4 or 5 people in the room. The 6600 could count people? 


However, this resisted tests. 


One of the displays on the Gazork was used to look at parts of raw memory (in octal). 


One day, just by luck, someone noticed a bit flickering on and off when the central 
computer was idle. This eventually was found to correlate with the doors swung out or 
not, and if out, with the number of people near that door. It turned out there was 
capacitive coupling between one particular door and the humans nearby, and this 
destablized one of the bits in memory (with no complaint because no parity checking), 
and sometimes this bit needed to be good for some programs. 


| would call this an idiosyncrasy! And only funny after it was tracked down and fixed! 
There's nothing all that funny about unreliably repeatable errors when they are 
happening to you ... 


€ Alan Kay 
Still trying to learn how to think better - 4y 


What did the early room sized super computers do? And how powerful are 


they relative to modern day computers? 
We could look at the AN/FSQ-7 — known as the “"Q7” — which was the main 
computing architecture for the SAGE air defense system in the 50s. 


It was actually two very large computers — comparable with a football field — where: 


e the computational elements occupied the second floor of a large concrete 
“blockhouse” (there were about 24 blockhouses around the US and Canada), 


the air-conditioning and power supply occupied the first floor (and spilled 
over into an auxiliary building), 


¢ acommand and control center was on the 3rd floor, 


e and many (~ 100-150) interactive graphical terminals with pointing devices 
for different kinds of tracking on the top floor. 


ore: 
on Bee 
ee | 


Par) 

OD tshey Chet Bene 
50 Coens Geeions 
PO Peet Comennet see 
Lal ] Feg we Phree e 

a fare hr redone 


ii 
nl 


Us ii 
H i 
iit 


pei 
un 


“ps AE J7ARRE gS ROA GRA 
iri 
Heat 
rf 


: 


fae S1k Coctrel Genter, Typtent Ploer Mews (kent 2 af 3 


The only good way to estimate the computing power of this machine relative to today 
would be to run benchmarks. 


However, here are some statistics for each machine: 


e 50,000 vacuum tubes 
instruction cycle time = 6 microseconds for basic instructions 
¢ main memory (magnetic core) = 256K bytes as 65,536 32-bit words 


* secondary memories (drum) = 50 “fields of 2048 words each = ~800K bytes 


Much of the arithmetic was done on half words two at a time. The two machines ran 
the same program in case one went down. 


Perhaps readers can try estimations against an iPhone or an iWatch. 


Basically we are comparing billions of operations per second with about 
170,000, so we should expect an iPhone to be considerably more than a 
factor of 5000 faster. 


e iPhone XS has 4GB of RAM which about a factor of 16,000 more. 


e Perhaps a single iPhone XS is about 16,000 Q7s? 


| would be curious to see more careful comparisons. 


€ Alan Kay 
<7 Still trying to learn how to think better - 4y 


How important was Ada Lovelace for computer science? 


This is a fascinating question about a “synthetic” or “artifactual” science (math is one 
of these also). This is a parallel to asking "how many times was the arch invented?", 
and “for a particular arch e.g. in the Arc de Triomphe, would it have mattered if the 
ultimate ancestor of that arch had died before the invention?” i.e. the arch is so useful 
that it probably was invented several times and once seen, was appropriated. 


On the other hand, vowels in languages with symbols for speech sounds are so useful, 
one would think they would be invented numerous times, but history indicates it only 
happened once. 


But calculus was invented several times, etc. 


Babbage is not mentioned in Turing’s original paper on computability and “universal 
machines", but years later is mentioned in Turing’s Al essay. In between apparently 
Babbage was mentioned at Bletchley Park during discussions with his mentor Max 
Newman, the driving force — with Tommy Flowers — behind the programmable 
Colossus. By this time there was nothing that knowledge of Babbage or Ada could 
have aided. 


It is surprisingly hard to tease out where engineering ideas stemmed from, in no small 
part to the way engineering is learned and advanced. For example, history thinks that 
the first electronic flip-flops were by Eccles and Jordan ca 1916, but seems to be 
vague on whether the use of this particular milestone circuit fed into the Tommy 
Flowers “valve logics” at Bletchley, or for ENIAC. 


Flip-flops themselves had been around mechanically for a very long time — most 
especially in the registration mechanisms of 19th century pipe organs (something that 
it was clear that Babbage was not at all aware of — or he might have seen a much 
better way to make a computing engine ca the 1840s (the best technologies around 
this time for computing were pneumatic). 


I think it might be fair to suggest that neither Babbage nor Ada were required to get 
computing invented, and it is likely the case that neither of them were direct 
inspirations for the much later inventions. 


But we should celebrate them for being “early with insights”. 


A similar chain to follow is that from Boole to Charles Peirce (a brilliant thinker who 
wrote ca 1870s that you could make a computing engine by using Boolean Logic and 
first proposed what we now call the NAND and NOR as universal logic operations). 
This operation is called the “Sheffer Stroke” today because Sheffer rediscovered these 
ideas in 1913. 


Finally, it's worth looking at what Konrad Zuse accomplished pretty much all by 
himself during the early 40s. This came to light years after the independent inventions 
of most of the same things that Zuse wound up with. 


When we had to write histories for the ACM History of Programming Languages the 
eminent historian of science Mike Mahoney told us the key was “We would know what 
they thought when they thought it”. In the case of languages that were invented 
within the ARPA community, this was really difficult because there were so many ideas 


being discussed. 


| think this is the case of Ada and Babbage. The question asked here probably can't be 
answered accurately at this point. A vague answer for Babbage and Ada would be 
“Probably not a critical factor, and very likely not a factor at all”. 


But she and Babbage should get full marks — as we give Zuse — for what they did do: 
very early insights and examples into many important parts central to what later 
became computer science. 


AD 


€A Alan Kay 
— Have designed a few programming languages : Updated 4y 


Were the compilers of the first programming languages written in machine 
code? 


| wonder if “machine code” here means “in on-or-off-bits” or “assembler”? And 
whether “programming languages” includes “assembler”? 


In the case of a really bare machine with no reasonable way of using another 
computer to help, programmers would still write in something higher than “raw bits", 
could be something like flowcharts on paper with arithmetic expressions, and then 
carefully hand translate this into a memory image of raw bits. This was tedious, but 
not considered out of bounds, because a lot of debugging was done by looking at a 
“core dump”: an image of memory, often in octal (base 8) numbers. So the 
programmers would have in their mind just what the CPU was actually chewing on 
and making. 


Also, occasionally one could debug on the machine (the Air Force allowed a 
programmer about 3 minutes a day and one had to work through a machine operator 
who could show a word in memory and flip switches to change it). The console 
switches were also used to dry start a computer by inputing a “boot loader”, usually 
for reading cards or paper tape. The CDC 6600 had a convenient panel of ordinary 
switches that one could use to set up a program of about 16 or so instructions to get 
things started. 


The first interactive debuggers — via teletypes or flexowriters — appeared on 
Whirlwind at MIT and its offshoots at Lincoln Labs in the early to late 50s. 


Assembly languages appeared quickly for every computer. Most of these had macro 
“expanders”, and some of these were quite elaborate — for example IBM Autocoder 
for the 1400 series had “tailored macros”: on each line of the macro you could write 
conditions on the parameters that would determine whether or not that line would be 
included in the expansion (thus optimization at assembly time). On a 1401 with e.g. 
just 2k 6-bit bytes of memory, this was a great help. 


Given the tiny logical size — a few thousands of bytes of memory — and very slow 
speeds — fixed point addition times of ~ 6 microseconds or more (early serial 
machines like EDSAC had an addition time of 1.5ms!) — of the computers in the 50s 
and 60s, many programs were hand-crafted in “overlays” (no virtual memory or 
memory mapping). One way to think of this is that the workable memory of the 
computer was on the secondary storage devices such as tapes and slow disks or 
drums, and the tiny still slow core memory was used like a cache. 


Some of the early computers used a drum as a primary memory — e.g. IBM 650, 
Burroughs 205 — and perhaps tapes or slower drum (or nothing) as a secondary 
memory). 


A concrete example from the 50s is the famous IBM 704 mainframe ca. 1954, the first 
commercial computer with floating point arithmetic, and an ancestor of many 
subsequent hardware and software systems. It had vacuum tube logic, could do about 
12,000 floating point additions per second, and had a primary memory of about 
18KBytes (4K 36 bit words) of magnetic core memory. 


The IBM 704 Console and some of the CPU 


Its advent in 1954 prompted John Backus to propose making for it an efficient high- 
level language with an optimizing compiler — it would be a first, and he decided to 
call it FORTRAN (for FORmula TRANSlator). 


The most interesting thing about FORTRAN was the great effort put into optimizing 
the semi-higher level code so that it could come close to competing with hand written 
assembly code in both size and speed. The bet by IBM for this large risky project was 
that just being able to handle simple arithmetic formulas and a few other things — 
like simple indexing usually in loops — would promote adoption if the size and 
performance of compiled code was good enough. 


Here is an excerpt from a letter about the first tests of FORTRAN in April 1957 by B.G. 
Oldfield: 


Our average compiling time is approximately 15 FORTRAN statements per minute and 
using an expansion ratio of 7 this means that 105 machine language instructions per 
minute can be compiled. Considering the addition of subroutines and input and output, 
compiling time will normally require two to three times as much 704 time as an 
equivalent assembly. 


You can see that this is right on the edge of acceptability in a normal shop of one 
small, slow, but very expensive, computer that had been purchased to run jobs, not 
assemblies or compilations. The theory back then was the programmers were cheap 
and their job was to do the optimizations and debugging offline. 


This was even more acute in my 1401 shop in the Air Force ca 1962. The FORTRAN 
compiler for our 8K-byte 1401 took literally about 100 passes — writing back and 
forth to tapes — to compile a FORTRAN program that would fit. So it — and COBOL 
— which had also appeared were essentially only curiosities. 


However, again from Oldfield: 


! would like to summarize by saying that our experience to date indicates that FORTRAN 
will 1) be very easy to learn, 2) quicker to code by a factor of 4 or 5, 3) have fewer coding 
errors (approximately 71 per 150 ), 4) will reduce the overall cost of a written instruction 
by a factor of approximately 3 or 4, 5) will produce very efficient object programs, 6) 
should handle at least 75 percent of our problems. 


From our experience to date, | am convinced that FORTRAN is a very significant step 
forward in program preparation for the 704 and will prove to be a very satisfactory 
programming system. 


In other words, FORTRAN was a great step forward. 


An exception to long compile times — which is out of the scope of this answer — was 
the Algol 58 compiler for our other machine: the Burroughs B220 (also with an odd 
architecture by today’s standards). This was the first “pass-and-half” compiler and it 
was quite efficient, but only so-so regarding optimization (the B220 was larger so less 
optimized code could sometimes fit). One of the several motivations for this design 
was the very-very slow tape drives of the 220: gazillions of passes were out of the 
question. Two of the famous names who worked on this early compiler were Bob 
Barton and Don Knuth. 


An interesting sidelight is from Tony Hoare's Turing Award Lecture (ca 1980) where he 
tells about his first big project ca 1961: to do a workable subset of Algol 60 on the 
rather tiny and slow Elliott 803 computer. He had been very taken by being able to 
use recursion (and could see that this would make his non-implemented idea for 
“Quicksort” really easy to do). This also motivated his approach to use top-down 
recursive descent for the Algol compiler on the Elliott machine. He says: 


“The compiler was structured as a collection of mutually recursive procedures, each 
capable of analyzing and translating a major syntactic unit of the language--a 
statement, an expression, a declaration, and so on. It was designed and documented in 
ALGOL60, and then coded into decimal machine code using an explicit stack for 
recursion.” 


This is very like the earlier use of flowcharts and other higher-level forms as design 
guides for a program, and then hand-translating into machine code to get something 
that could run efficiently enough. 


Good questions would be “What is the first compiler to be written in a higher level 
language?” and “What is the first compiler for a programming language to be written in 
itself?” | think | used to know this, but it’s drifted. Large parts of this problem were 


addressed by the first Lisp community at MIT in the early 60s, and this could be the 
answer. Very quickly, they used a machine coded Lisp to write LAP (Lisp Assembly 
Program), and there is mention of several Lisp compilers in the early documentation. 


However, | think the first really serious example of a higher level language being used 
to the exclusion of all else was ESPOL on the Burroughs B5000 (ca 1962), which was a 
computer whose “machine code” would be called “byte-codes" today, and which had 
many other advanced features (some beyond current day designs). This whole systems 
design cannot be too highly praised. 


The B5000 was designed to run programs from very high level languages like Algol, 
COBOL (and eventually Simula), and one big early decision was that there should only 
be higher level programming for the operating system and compilers. To this end they 
added a number of features to Algol to create ESPOL (Executive Systems 
Programming Language). The OS, the compilers, etc were all written in this language. 
Using the “1.5 pass” compilation and the efficiency of the machine, the B5000 could 
compile about 300 statements per second. 


All this was possible so early on because this was one of the few instances in 
computing history where all parts of the end problem — including hardware, OS, 
languages, and deployable applications software, were all deeply considered in the 
overall designs of the whole system. Bob Barton was arguably the key figure for this 
project, but as with most great things, there were quite a few high talents involved. 
This whole systems approach to design was an inspiration for several subsequent 
projects, most notably those at Xerox Parc. 


€ Alan Kay 
+ Still trying to learn how to think better - Updated 3y 


Who created the computer? 


There have been similar questions on Quora, and I've given a variety of answers. 
(Which I'll try to find and reference here (would be really usable if Quora used that 
strange invention of “windows” to allow this kind of perusal on this page while writing 
an answer ...).) 


A computer is a process that deals with represented descriptions. 


So you can make an “addition computer" by using two rulers .... When we make an 
algorithm — such as the greatest common divisor — we are making a computer (and 
this computer goes back to Euclid). In both cases we are doing some of the energetics 
for the computations. 


Here, | think “computer” means “programmable, digital, stored program computer". 
Historically, we need to separate early good ideas for these from the actual building of 
successful mechanisms. For example, both Babbage and Ada wrote algorithmic 
programs, and Babbage had Jacquard loom-like schemes for programming a 
mechanical computing device. Ada felt the universality that might be possible, and 
expressed the idea obliquely in several of her writings. 


Turing proposed a very modern kind of mechanism for a computer, and especially a 
universal computer in 1936. Mathematically this was equivalent to earlier results by 
Goedel and Church, but the form that Turing chose was loaded with pragmatic 
whispers. Some of these were later heeded by John von Neumann. 


| think the award to the person who made the earliest real mechanisms that touched 
more of the important ideas in computing than anyone should go to Konrad Zuse. He 
was not the only one operating in the early 40s, but the scope of his accomplishments 
is really impressive. Konrad Zuse - Wikipedia @ 


Konrad Zuse and the Z3 programming digital computer from 1941 


Zuse started to build computing systems in the last half of the 30s. Because of WWII 


his work was lost for a while. 


Here's a good chance to use the computer you are using right now, to find out about 
Bletchley Park, Colossus, Max Newman, and Tommy Flowers. This work during WWII in 
the UK was also suppressed for many years. 


After WWII, there was interest in creating programmable computers in both the US 
and the UK. Maurice Wilkes of Cambridge University in the UK attended a workshop at 
Penn where Eckert, Mauchly, von Neumann, et al attended. For Wilkes there was a bit 
more talking than doing, so he decided to just make a machine like one they were 
talking about. He went back to Cambridge and did just that, to produce ca 1948-9 
EDSAC, my vote for “something wonderful" as a “first” real programmable machine.* 


Note the CRTs in the background that were used as displays for EDSAC. 


There is an online emulation of EDSAC that allows the user to program it — I've 
occasionally asked students to write some programs on it. Edsac Simulator @ 


The online EDSAC Emulator 


Finally, it's very worthwhile to look at the MIT Whirlwind | computer ca 1950-51 for 
not just a very modern parallel “mini"-computer (it required a few large rooms), but it 
was also the first “real-time” computer, had some of the earliest (maybe the first) 
interactive graphical displays, and (I think) the first interactive higher level 
programming language (George” by Laning and Zwerler). 


MIT Whirlwind console room ca early 50s. 


* This is not to omit “Baby” at Manchester, which was a small test machine that did get 
working a little before EDSAC. However, EDSAC was not a test computer, but the thing 
itself. 


€ Alan Kay 
Still trying to learn how to think better - Updated 4y 


Why were computer monitors not rectangular to begin with? 

I'm not sure | understand this question. Do you mean “as opposed to being circular” 
or “as opposed to being square”? If you clarify your question | might be able to 
answer. 


In brief, many of the earliest computer displays were made from Cathode Ray Tubes 
originally used for radar (most were circular) or for oscilloscopes (most were circular). 


Whirlwind with one of the first interactive computer displays (early 50s) 


P.S. The questioner — Leslie Anne — writes in a comment: / am thinking in terms of 
90's desktop PCs. That's about as far as my knowledge/experience with any type of 
monitor goes. Thank you! :) 


Movies are in rectangular “landscape mode” in part because this is a rough 
approximation to human two eyed field of view (as is an open book showing two 
pages, etc.) This is a little tricky because most of the acuity in an eye is in the small 
foveal region, and the eyes actually scan the scene and the brain puts the scans 
together to form a “psychological scene” that seems more expansive. 


TV started with circular CRTs — but displaying a rectangular raster of scan lines — but 
their popularity led to both rectangular landscape screens of larger and larger size 
(which led to a switch to magnetic yoke beam control, etc.). 


In the 1960s you could get a wide variety of both kinds of CRT in various shapes. For 
example, the FLEX Machine display used a Hewlett-Packard electrostatic CRT in 
landscape mode that was also rectangular (as did the initial RAND Tablet display a few 
years earlier). 


yrt 
5 


Tom Ellis and his RAND Tablet using the large screen HP Electostatic Monitor 


The Xerox Alto Display at PARC was a high quality magnetic yoke display that came 
from Ball Brothers (of the aerospace company Martin Marietta). For a variety of 
reasons we first went “portrait mode" (because of the Dynabook and because we 
wanted a full page deep). Not long after, still in the 1970s, we were able to make a 
double page sized display. 


Our main complaint when the 1980s came around is that the display sizes of 
commercial personal computers were greatly reduced but kept landscape so that full 
pages couldn't be displayed. Apple or one of its after market companies made a 
display that could be rotated from portrait to landscape (and would tell the computer 
so the image could be changed accordingly). 


€ Alan Kay 
Still trying to learn how to think better - 4y 


What is your hobby that has the best bang for the buck in happiness per 
dollar? 


I've never thought about hobbies in terms of cost — they are all about passion and 
“work/play” for the sheer fun of it. And my main ones are about “life itself" and non- 
optional. So the “happiness” part from these is enormously high. 


| don't know how to choose between reading and playing music — but | spend quite a 
few hours every day doing both. 


A book takes me about 1-5 hours to read, and will cost around $20-$40 usually (I 
mostly read non-fiction). 


So: perhaps $10/hour? (Does this even mean anything?) 


Right now I'm learning to play the lute at least 2 hours every day. The cost is mostly 
getting a great instrument (I've found that it’s worth really committing if you are 
going to learn an instrument, and then being willing to pay for a very good instrument 
— it makes a big positive difference, even in the beginning, both for achieving sound 
and for deep inspiration). 


For music, figure 1000 hours each year for lute practicing, add in attending 2-4 
concerts a month, hours singing in choirs, etc. 


Even if we don’t amortize the lute over multiple years, it roughly breaks even with 
books at $10/hour, and the net is quite a bit less over multiple years. 


Both of these seem like incredible bargains to me (and most certainly not at all 
deserving of money worries). 


€ Alan Kay 
Still trying to learn how to think better - 4y 


What is the oldest or most unique computer you have ever used? 

The “oldest” -or- “most unique” computer I've ever used? Let's take “programmable 
digital computer" to simplify away from various kinds of analog computers and the 
abacus, and “written a program for" instead of “used”. 


The “oldest” has two candidates (a) the RAND “Johniac” which started working in 1953 
(b) the Lincoln Labs TX-2 computer, which dates back to 1956. However | didn’t write 


programs for these until 1967. The first real programmable computer | wrote 
programs for was the IBM 1401 in Air Force Air Training Command ca 1961, followed 
shortly by also programming for ATC’s Burroughs 220 machine. And then a host of 
others. 


“Most unique” is harder because so many of the early machine architectures would be 
considered “unique” by today’s pretty vanilla conventions. For example, looking from 
today back then, all of the computers I've mentioned would be considered “unique”, 
and they were all very different from each other. 


For me the most challenging computer on which to write “good” machine code was 
most definitely the Control Data 6600 ca 1965 — it was a parallel machine with 
multiple operational units and a tiny cache and resequencer that the programmer had 
to have accurate visualizations of to make worthwhile programs. 


And then there is microcoding. The most challenging there was the FLEX Machine that 
Ed Cheadle and | did ca 1967-9. This was because the microcode memories available 
then that could be used for a personal computer were so slow. The remedy was to 
have a very wide code word (120 bits wide on this machine!) and try to do as much as 
possible within these 120 bits before having to fetch another code word (yikes!). By 
contrast, the Xerox Parc Alto had a fast microcode memory with quite nice 32 bit 
microcode words so just a few little tricks were required. 


€ Alan Kay 
Still trying to learn how to think better - 4y 


What was the last breakthrough in computer programming? 

There are lots of dimensions to “computer programming” and “programming 
languages”, so simple sorting won't be satisfactory. But there is also a dilemma — | 
think — in that “something happened” in the 80s to what were major improvements 
every 11 years or so. In 1984 | was asked to write the lead article for the old Scientific 
American for a special issue on “Computer Software” 7, and included the following 
diagram: 


u 
wa > 8] 


t 
ben 3) Ts ae we o72 
SOFTWARE GENRES worceed cae another at eperadix intervals, a: is vhorws hore Grong the 
example ef wocne prograwereng leegeegrs | aequeaces are cateperined eather arbinarily by bev- 
oh, atteegh tre beers felernd Aewtd cverieg There are lew-tewel languages feitl, high 
level langrages (til very-bighteret leegemcrs (VNELD eel eltewhigh-levet lengeages (lem kh 
be the cvetetion of peegrammiag begeage 3 genre ls eotablnbed (her zowre! wher Dec), then 
after a few yours om lmprovceecert bs meade Gewrrre mtrte Leet, fn Gene the ineprered Mepeage 
is sore te be wet renetp @ “better alt thing” bet on “aleveet wow thing,” aed & beads to the oeet 
wists grees. Tike lengaage Linp tery chereged repretedip, mech there benwereng @ fee grere. 


The article noted with tongue in cheek that the transition from “a better old thing” to 
“almost a new thing” and then “a new thing” happened at about sun-spot intervals ... 


Lisp shows up both as a transitional language and as a “new thing” because more than 
most computer languages it appears as a kind of “material” as much as “a 
programming style”. 


In 1984 | picked several other things that | thought were transitional. For example, the 
advent of spreadsheets brought up the possibility of massively parallel “swarm” 
programming, and | devoted some of my article to discussion about what Moore's 
Law would allow. | thought the other promising direction would be “more meaning" in 
programming, and especially programming languages and systems that could do 
problem solving on the behalf of the larger goals for the programming. This was 
inspired by what Sketchpad had been able to do for numerically constrained 
relationships, and extended the idea to “requirements” and “specifications” becoming 
“runnable” and “debugable”. | said less about this in the article. 


If we look at this today, 35 years later, it is hard to fit what happened into the 
ascendant qualitative levels | used for the first 35 years of programming. We can see 
that though Simula was one of the inspirations for Smalltalk, its transitional model 
remained as C++. The Planner to Prolog to Eurisko line petered out in programming, 
but did form a major part of a real “meaning based” system, Doug Lenat’s CYC. 
Interestingly, the “swarm object pattern matching" ideas did get reified by Joe 
Armstrong's Erlang — he started independently with his own instincts and goals — 
and this language is quite active today, though perhaps not mainstream. 


To try to be as brief as possible here, the surprising thing to me over the last 35 years 
has been the lack of “real engineering vigor" in “software engineering”. While other 
engineering fields were starting to use CAD and SIM to design and vet their designs 
— often on supercomputers — the shoemaker's children wound up with no shoes 
with so much software development retaining poor and weak methods from the past 
and staying with simulated punched cards and non-live development. 


And in the last 15-20 years, the real engineering fields have been increasingly able to 
add automated "FAB" to the CAD<->SIM process and derived from it. In software 
engineering, this would mean being able to automatically move from vetted designs 
on supercomputers to optimized systems that would work on commodity machines. 
One part of this optimization has been done — it does not use “meaning” but is more 
of a Moore's Law device: namely the tracing JITers that are being used more and more. 
The seed of this technique was pioneered by the legendary Peter Deutsch for 
Smalltalk systems in the early 80s, and, today, has been taken far enough to be really 
useful along many dimensions. 


As for programming itself, the rallying cry I've tried to put forth is: “It's not BIG DATA, 
but BIG MEANING". In other words, the next significant threshold that programming 
must achieve is for programs and programming systems to have a much deeper 
understanding of both what they are trying to do, and what they are actually doing. 
That this hasn't happened in the last 35 years is a really unfortunate commentary on 
the lack of a maturation process for computing. 


€ Alan Kay 
oT still trying to learn how to think better - 4y 


When Alan Kay talks about the size of the SAGE and the TX-2 computer, he 
uses the terms like "the size of the football field" and "the size of a 
building", what does it exactly mean? 

Here's an example of a SAGE four story concrete "blockhouse” with “power supply” in 
the first floor, the two large Q-7 vacuum tube computers on the second floor — they 
ran in tandem as one system, the “situation rooms” with enormous display projection 
equipment (Hughes and Eidophors), and on the fourth floor, the many display and 
light gun consoles for surveillance and tactics, all next to a football field. 


Here's a row of SAGE consoles with an Air Force staff sergeant using a light gun to 
point at the display: 


About 30 of these blockhouses were built (most by the end of the 50s) and distributed 
around the country . The last one was decommissioned around 1981 or so. 


Question for Quorans: where did they get the replacement vacuum tubes for the Q-7 
computers the last 10 years or more? 


€ Alan Kay 
{Worked at The Walt Disney Company (company) - 4y 
What is it like to work at Disney as a programmer? 


| wasn't a programmer there for the five years | spent at Disney — but the company 
had invited me to bring my research group to be part of Walt Disney Imagineering, 
and — partly because | knew and admired some of the old time Imagineers — we 
wound up making the transition from Apple in late 1996. 


The question is asked in the present tense — and about a showbiz company that has 
transformed itself several times since the Michael Eisner years that we were there. So | 
couldn't say anything either about programming there or even working there. 


I can say that in the 1996-2000 years “the creatives” at Disney were just great to work 
with, and that “the suits” were a mixed bag at best, with some of them being quite 
awful. The previous sentence was generally pretty true of movie studio companies, 
and is likely as true today. 


€ Alan Kay 
© Still trying to learn how to think better - 4y 


Why is the guitar "easy to learn but difficult to master"? 


The “difficult to master” part is common to all music because people who are deeply 
involved with music don't just make it, they explore it, extend it, invent and reinvent it. 
Like sports, this pushes the boundaries towards human limits to the extent that even 
highly talented individuals have to do a lot of practice over many years to get near 
“mastery” (| don't know of any really great musicians who would claim to have 
mastered their art). 


Ken Hower — in these answers — says it well. 


Many years ago | taught guitar to a wide range of students of different backgrounds, 
ages, motivations and psychological types. The trade-offs in the early stages are some 
literal fingertip pain, but with the close reach of a few relatively easy chord patterns, so 
that something that overlaps with most aspirations can be gotten to right away. 


For younger children and some adults | suggested that they try ukulele first: the chord 
and note shapes are the same as the top four strings of a guitar, the touch is less 
painful, and much of the other musical learning of rhythms, listening, accompanying, 
etc. have an easier start-up ramp. This is especially true for both the baritone uke and 
the slightly larger 4 string “tenor” guitar. 


The relatively easy “can do something early on" for guitar — compared to violin, 
trumpet, etc. — is also a bit of a barrier because what actually needs to be learned is 
just as onerous and requires just as much work and time as any other instrument (for 
the reasons in my first paragraph). | found — even with children — that this could be 
explained, and most of them could deal with the ideas: it's going to hurt for a few 
weeks and try to just soldier through it, you will be able to do X in a month or two, but 
Y will take a few years. Every guitar player goes through this, and you can too. 


I'm guessing that things are not so different almost 60 years later, but now we can add 
in keyboard synths as “can do something early on” that can soon provide barriers for 
really learning to play keyboards (and music). 


€ Alan Kay 
ST still trying to learn how to think better - 4y 


What will Silicon Valley do once it runs out of Doug Engelbart's ideas? 


I'd like to see. 


However, we are in no danger at present. Silicon Valley has more or less ignored or 
abandoned many of Doug's best ideas, so they are still around to be rediscovered and 
put forth. 


Right now we are also not in any close prospect to seeing his ideas understood and 
used, which is really a shame. 


€ Alan Kay 


Have designed a few programming languages - 4y 


Which is the best book for teaching programming to kids (11 to 16 years)? 


The “best book” right now might not be good enough. | don't know of a “really good" 
book for teaching programming to 11-16 years of age, but there might be one — I'd 
love to see what other Quorans think are good books. 


A deep question here is “plasticity” with regard to learning a subject. We know that it 
is not a good idea to let children do deep learning in music — especially playing, 
especially classical — without helping them develop good technique (it can take years 
to undo bad technique — | know! — even with a lot of will power, and it usually 
doesn't happen). 


However, it is likely that many subjects will be plastic enough to allow several starts. 
On the other hand, “what is learned first” still often rules further development in 
various ways, especially when stress is present. 


| don't know of definitive studies regarding most subjects, including programming. 
There is a lot of casual and anecdotal evidence that many programmers fasten 
strongly to the first styles of programming they learn, and not only try to use these 
styles thereafter, but also oppose learning new styles if they conflict with their Ur- 
style. 


If so, then most of the ways to teach programming — in general, and especially to 
children — are likely disastrous (as with both math and science, etc.) 


Age 11/12 used to be considered the “model airplane” age/development stage (back 
when model airplanes were built from very discrete parts (often cut from balsa wood 
with razor blades)). It was also the “Tesla coil" and “ham radio" stage. 


This is because new abilities for planning and organized doing start to show up at this 
developmental level — and quite a bit more fine muscle learning and control. 


In our experience over the last 50 years or so, this is also a great age to really get into 
programming. 


A key property of “model airplane", “Tesla coil", “ham radio” is that they are all 
reasonable sized systems, so that making them simultaneously involves “nuts and 
bolts" *and* seeing how a whole system is designed, organized and built. 


To me this is absolutely critical. 


We can also look at reading and writing with a similar eye, and we'll see that a key 
here is *literature* — that is: contact with well designed things made by others that 
engender both pleasure and learning larger scale ideas and organizations. Trying to 
teach reading and writing outside of the context of *ideas* and *examples* and 
*usage* is not just a bad idea, it's kind of crazy. (This does not prevent schools, etc. 
from being stupidly ham-handed about many subjects!) 


And this is where most instruction in programming falls down badly. Much of it 
combines “ideas that are too simple to scale well” and “absence of larger 
organizations that show what the real deal is”. 


This connects with a recent answer | gave to: Shouldn't object-oriented “design” be 
studied before object-oriented "programming"? I'm sick and tired of these useless toy 
examples in these programming books. Why shouldn't we learn to design first and 
then implement? 


So: we need *big ideas* (could be from math, science, computing, etc.), we need live 
working examples of the ideas, that also *contain ideas*, that can be safely taken 
apart, explored, added to in meaningful ways, etc. We need a language and system 
that is “graceful” about being a substrate for all these. We need helpers — both 
human and system — than can provide insights when needed, etc. 


Why not? 


Alan Kay 
Have designed a few programming languages - 4y 


What's the difference between applicative, recursive, and imperative 
programming? 

Programming is providing a mechanism with what's needed to achieve a goal we 
have. 


Sometimes the mechanism can be dealt with very directly. Sometimes things will work 
out better if we first get the mechanism to act like a more convenient mechanism that 
we can then provide “what's needed to achieve a goal we have”. 


In the earliest computers, and at least well through the 60s, the code written by a 
programmer were called “instructions” (and often “orders”). The programmer was 
ordering the computer to do this and that, and the computer processes usually did 
the same to “data”. This was rather like cooking: with recipes, a cook, ingredients and 


tools. All what would be called “imperative”. The simplest example is for a store 
instruction to a location in memory to be able to obliterate what was there and 
replace it with something else. 


Some realized that this didn’t scale well at all, and one of the early machines that 
could protect against willy nilly imperatives was the Burroughs B5000, much of which 
was the creation of Bob Barton. 


Another idea which appeared early was that of “subprocedure"/"function” as bodies of 
code that expressed useful relationships, and would often employ parameters. The 
distinction between the two was that “function” should be like a mathematical 
function (a) it would create a value, so the function could be used where a value would 
be used, and (b) ideally, for the same inputs it would always produce the same output. 


This led to the question of whether you could/should call such a subprogram while 
one was already executing it (e.g. FORTRAN couldn't). But there were any number of 
problems whose parts resembled the whole (they were “fractal” in one sense or 
another), and the idea gained ground. Algol 60 was to be a kind of multipurpose 
programming language — so you could be both “rather imperative” and "rather 
functional" in it — and the subprograms could be called “recursively” from within 
themselves. On most computers of the day this required software to simulate the 
hardware that would be required (the B5000 had these mechanisms directly in the 
hardware). 


I'm describing in this fashion to make the point that you could do “recursive 
programming” and “imperative programming" in the same subprogram in a language 
like Algol 60. 


Earlier, in the 50s, it was realized — especially by John McCarthy — that many 
attractive parts of basic relational mathematics could be shaped into ways to program 
computers that could be more convenient, less error prone, more amenable to 
“thought and proof” etc. On the machines of the day, this required the simulation of a 
“more mathematical” machine that could handle ideas such as functions operating on 
functions (“application”) etc. 


In McCarthy's way of thinking, you could make something like “sets” (really 
“collections”) by defining them as “a something” possibly followed by a “collection” — 
John did this in a clever way by making his base definition a “pair” of one thing 
hooked to another, and used this to get everything else. This is a “recursive definition” 
of a sequence. 


His definitional scheme was extended to thinking about how one could define the 
language compactly in itself, and this led to a nice representation of functions 
themselves and interesting questions about what it might mean to use them as 
parameters (for example, if a function had a free (global) variable, what would it mean 
to pass the function as a parameter to be used eventually in a different context. 


The treatment of functions as “values” that could be passed around shows up clearly 
in John's definition of “apply” in Lisp, a function that would “apply” a functional value 
to a list of parameters (which could themselves be functions, etc.), to create a new 
environment (by binding parameters to functional variables), so that the meaning of 
the function could be found via “eval”, and eventually a result would be found.. 


McCarthy's ideas were so neat and clean that admirers — like Christopher Strachey 
and Peter Landin — looked much harder at ways to unify the simpler imperative 
schemes in the much safer much more powerful applicative frameworks. (One of the 
best of these is just a bit too far out of the scope of this note.) 


€ Alan Kay 

7 Have designed a few programming languages - 4y 

Shouldn't object-oriented “design” be studied before object-oriented 
“programming”? I'm sick and tired of these useless toy examples in these 
programming books. Why shouldn't we learn to design first and then 
implement? 

Originally Answered: Shouldn't object-oriented “design” be studied before object-oriented 
“programming? I'm sick and tired of these useless toy examples in these programming books, why 
shouldn't we learn to design first and then implement? 

| like Marvin Minsky’s solution to the problem of teaching programming (in this case 
teaching Lisp to grad students in the 60s). He would give them the previous year's 
PhD theses and ask them to find how the thesis did something, and then make a 
change to it. Sometimes this would be solo and sometimes with several grad students. 


Similarly, in the early days of Unix at Bell Labs (when it was much more like its original 
philosophy), you would learn to program in both C and a variety of mini-languages 
influenced by SNOBOL by examining working systems and making changes to them. 


And at Parc with Smalltalk ... 


In all these cases you are writing small programs to begin with, but in the context of 
more sophisticated designs. This is very much like learning structural carpentry in the 
context of building a house — the apprentices get small tasks initially but can see how 


the whole house is being organized, and they get to see a variety of design principles. 


Both the Lisp and Smalltalk situations were dealing with live incrementally improved 
systems, and so were especially amenable to this kind of learning (and Unix was 
“semi-live", in that the Unix processes were live and the C programming was "a bit less 
than live". 


€ Alan Kay 
“7 Still trying to learn how to think better - 4y 


Does Alan Kay think about how research into human biology and medicine 
is conducted? 


I'm not at all sure about the center of this question. Perhaps the questioner can clarify 
ina comment... ? 


At least at a shallow level, | would say “yes, | do”. However, I'm sure biologists consider 
me “lapsed” at this point (| certainly do, though | try to keep up via reading). | also 
occasionally visit bio-research centers — like Janelia Labs — and seem to be able to 
still hold up my end of a conversation with molecular and cellular biologists (though 
just how polite they are being | can’t tell). 


| think I'm less up to speed with research in “human biology and medicine”. For 
example, I'm aware of a number of “pretty good” simulation models for cellular level 
organisms, but am not aware of good simulation models of human systems (for 
example, the human endocrine system is quite complex, and the immune system even 
more so). Another area that I've grazed but don't feel at all definitive about is the 
human microbe ecology (of some 10,000 species and counting) that we are the hosts 
for, and for which we still have very little idea of the function of more than a few 
species of the 90 trillion cells without our DNA that inhabit us. 


I'm a bit of a hypocrite (or at least a do nothing) with regard to using animals for 
testing. | don’t like it at all for many reasons, but I've never been part of a protest or 
signed a petition. The most | did was to refuse to sacrifice animals while | was a 
student. | have a similar double standard with regard to meat — | eat it, but | grew up 
ona farm and got to see the whole process, and didn’t think it was good. (I just don't 
think we can justify that kind of arrogance and expedience. And nowadays, it is quite 
possible to raise “meat without a brain attached” and we should do so. This would 
also deal with the greenhouse gases associated with meat animals (including 
chickens)). 


I'm a scientist — even if somewhat lapsed — and so simultaneously love the 
knowledge about all manner of things that science has found and is finding. A large 
part of my lifelong interest in “cultural changing education” is from my dismay at how 
many humans find the weapons potential in any inventions that increase power. This is 
a disastrous route, and we are dangerously far along it. 


€ Alan Kay 
<1 Still trying to learn how to think better - 4y 


What lessons were learned in aspiring towards the DynaBook, and have any 
of its original goals become dated? 


A big subject over many years (the Dynabook idea was formed and fleshed out 
starting in 1968 with a cartoon | drew and a cardboard model | made — so more than 
50 years ago now). The first public paper | wrote about this was in 1972 @%. 


The First Dynabook idea — Fall 1968 
Alan Kay 


fledged users keyboard removable 


mass memory 
collaborative end-user 


The large idea that motivated the conception of a highly portable tablet computer 
with a touch-sensitive flat-screen display with stylus and wireless networking was 
catalyzed by seeing how Seymour Papert was helping children learn to do deep forms 
of real math by inventing a programming language — Logo — aimed to match up 
with the kind of thinking they can do at various developmental levels. 


One of the ways this translated to me was that an interactive computer could be 
thought of like an active book. A larger idea — a “medium” — than the tool-based 
and “vehicle"-based metaphors of the 60s. And this brought forth all the connotations 
of literacy and civilization inducing changes of thinking brought by writing and the 
printing press. This idea fit very well into the ARPA ideas of inventing “interactive 
intellectual amplifiers pervasively networked worldwide” and widened it out to 
thinking of the computer as a "“meta-medium” and that in any considerations of a new 
“literacy” whatever it was had to most strongly include children. 


Many of the ideas for what a Dynabook should be able to do — and how — were 
adapted from the advances that the ARPA community had made from the early 60s to 
1968. | thought that a language with the flexibility of Logo that was aimed at children 
in a similar way — and also included object-orientation for several reasons, including 
simulations; and that had some of the facilities of PLANNER to allow the kids to make 
systems that could reason — was called for. 


Pedagogically | wanted to make “a world” that had the environmental properties of a 
Montessori school (with some echoes of Dewey also), and that was based around the 
ideas of Jerome Bruner and Seymour Papert. My main aim was to help children learn 
“science as both sciences and humanities”, so it was a bit wider than aiming at "math" 
(but included math). One way to think about a Dynabook is that it is mainly centered 
about all aspects of user interface design, especially for children, and not just about 
how to access, learn and use a computer but how to access, learn and use ideas. 


One lesson learned was that Montessori was “even more right” than | thought back 
then. Her approach was very rich and very tuned to what human beings — and 
children — actually are. Her deep emphasis on design of environment as a living 
version of the epistemological stances she wanted the children to absorb — and 
especially “designing the adults” in this environment — have really stood the test of 
time. In the late 60s, we hoped that a Montessori environment could be made within 
the computer itself, and that it would not be necessary to do the extensive finding and 
training that was required to create a Montessori teacher. 


After 50 years, | think that she was more right than we were. One of many perspectives 
on her approach was that she wanted to help many more children become real 
autodidacts. We on the other hand generally assumed that more children would 
become autodidactic just given a better environment. But a lot more is needed, and 
much of the motivation many children need for deep learning is social and especially 
from adults, that doesn't fit well into media, even with much better Al than we have 
today. 


But, there is a chance that adults and peers can be motivators as kind of partners with 
computer media that is yet to be invented. If so, that would break a huge log jam 
today. 


| don’t think the original goals of “helping children grow up to think much better than 
most adults today” have become dated. The last 50 years only provide more evidence 
for how desperately this is needed — and the next 50 years could bring the enormous 
penalties for not taking this seriously. 


| did allude slightly to the “cognitive sugar water” problems of making a new wide- 
spread networked portable public medium, but assumed that we could convince 
educators to include warnings and sensibilities about all media — writing, television, 
dynamic media, etc. — as part of a new curriculum for the 20th and 21st centuries. 
They didn't get it, are really suffering the consequences of this (as is the country and 
world), and it's not clear that they yet understand what has been and is happening. 


As we progressed with experiments and user interface design in the 70s, we made up 
many slogans as mantras to help simplify the very complex business of designing for 
human beings. One was “The end-users are not like us”. Another was “Doing with 
Images makes Symbols”. And so forth. 


Something that | think we understood at the time, but didn’t put a lot of effort into 
was the difference between “instrumental reasoning” and “larger scale thinking”. 
Humans have the former baked into them by their genetics but not the latter — so we 
should have made “other than instrumental” a larger part of our approach. Similarly, 
we have “coping” baked into us by genetics, but not “progress” (the latter was almost 
an invention — perhaps as late as the 18th century — to even make the idea visible). 


There's lots more that could be said (history is difficult because the past is so 
detailed), but | can sum up an answer to the question by saying: We learned many 
lessons — most of which were about human nature and how most people react to 
deep learning situations — and | think the original goals are still good, but could use a 
few more for better balance. 


é Alan Kay 
“1 Still trying to learn how to think better - 4y 


Will technological advancement continue as quickly as it has from the 
2000s to now? 


The keyword here is “advancement”. | think of the most important use of this in 


phrases such as “advancing the human condition’, etc. 


If we take the trouble to try to understand our own species — via anthropology and 

its spinoffs — then we can see that quite a bit of the technology over the past 
hundred or more years could be considered “inverse vandalism”, i.e. making things just 
because you can (consider that every one of the 100s of 1000s of items in a flea 

market once was cared for so much that effort was put forth to produce — usually 
mass produce — them). 


More critically have been the invented communications technologies. Writing changed 
thinking, and the printing press created enough people with changed thinking to 
change societies. | think this was a good change, but the Catholic Church and other 
such systems would not agree. 


Electric and electronic communications systems changed latency of information 
transmission, but much of it also allowed oral and visual modes of thought to start 
replacing many of the literate modes that had grown up around writing and printing. 


I think this replacement is bad, but marketing people, business in general, and 
authoritarian wannabes, would not agree. 


A better ploy would be to have public education keeping pace not just with 
technology, but with understanding of the species they are supposed to be helping to 
learn things. This has been an abysmal failure. All communications systems need to be 
not just learned, but to be treated as powerful (and hence also dangerous) tools that 
will affect us as we try to use them (as Thoreau said “We become the tools of our 
tools”). 


This means that society is losing the battle of understanding what is happening to us. 
Ideally, in our day and age, the increased understanding that science has of many 
things should be factored into our societal planning and learning. When this doesn't 
happen we are letting both invention and the industrial revolution’s ability to make a 
lot of things very inexpensively and distribute them, to overwhelm us with great 
powers unbalanced by perspective with which to use them (or decide not to). 


So, | think that technology will progress — because it depends on what we 
understand about the physical world — and that humanity will regress — because we 
generally refuse to try to understand ourselves and how we need to learn to handle 
our new powers. 


Alan Kay 
Still trying to learn how to think better - 4y 


What can I do to develop my explaining skills? 


Answering Quora questions for which you are willing to do the necessary work is great 
practice. 


€ Alan Kay 
I'm the "Alan Kay" in question (try Google for the usual misinformation) - 4y 


Has a robot ever smoked? 


I think the robot shown at the 1939 world’s fair (“Elecktro”?) smoked as part of the act. 
Yep. 


€ Alan Kay 
* 1 Still trying to learn how to think better - 4y 


What do some not understand or know about the difference between an 
amateur and a professional? Has this ever affected you personally? 


I've never liked these terms (and have been guilty of using them myself). 


The two main definitions of “amateur” are “not paid” and “not good”. “Professional” 
means “paid” and generally connotes “good”. 


But the French word from which the English term is derived means “lover of". 


We get: 


lover — paid —— ability 


no no ———— poor 
no no ———— good 
*no yes ————— poor 
no yes ———— good 
yes no ———— poor 
yes no ———— good 
yes yes ————— poor 
yes yes ———— good 


Most of these are useful, and some of them could use special terms. The * flags one 
category that | don't like to see in any pursuit. Both of the italicized ones are 
admirable (and they do both exist thank goodness!) 


é Alan Kay 
1 Still trying to learn how to think better - 4y 
How do | modulate from E major to B flat major? 


Lots of ways — let's see if we can get answerers to contribute some of their ideas. 


| take “modulate” here to mean winding up in the key of Bb major and feeling that as 
a key center (as opposed to just getting deftly to the Bb major chord). 


It's worth trying to figure out ways from scratch before looking to see how e.g. 
Schubert did it (he was a great sleight of hand modulator). 


There is also a very useful little book that has 1 bar, 2 bar, etc., modulation schemes 
for most possibilities. 


A good cadence is V-I, and the V chord (or V7 chord) for Bb is F | F7. If we can get to 
this, then we can really cement the key change feeling to Bb. 


It's probably too brute force to just go from E to F. 
C7 is the V7 chord for F. So how could we get there? 


It shares a note with E, and going to the C while holding the E is a nice change. If we 
look at the voice leading, the G# would go to G, and the B would go to Bb or to C or 
both. This change is known to work well. 


More subtle would be to go from E to C#m (its relative minor, which also has the E 
note), and still holding this note, go to the C7, and then onto F7 and to Bb. 


A faster route would be to do E, F#m (this is the ii chord in E), then holding the A in it, 
go to F7... 


And so forth. 


Hard to leave questions like this alone ... 


As Jeremy Cotten pointed out in his comment, one of the fun things about this 
question is that the tritone interval between the two keys gives rise to both distances 
and similarities. For example E7—-5 has the same notes as Bb7-5. Even better, their 
V7-5 chords also have the same notes: so B7-5 is also F7-5. A typical jazz cadence 
using this substitution would be “blah blah B7—-5 F7-5 E”. You could try “blah blah 
B7-5 F7-5 Bb", and it will be a little shaky but workable. 


Another tritone approach is to listen to how they are used in Scheherazade, and note 
that if you pop between E and Bb with the same melodic motif in each (one of these 
in Scheherazade is EBE| Bb F Bb|EBE| etc. ...) you soon destroy E as the tonal 
center, and this could allow you to just take off from one of Bb landings ... 


Another route to finding modulation paths is to look at “voice leading” possibilities. 
For example, writing a melody from E up to Bb could go like E F# G# A Bb, or EF# GA 
Bb. Try harmonizing these. Or you can write a descending bassline: E D# C# C Bb — 
and harmonize it. 


You don't have to voicelead to roots (though a nice bass line can be very convincing). 
A fun game is to get from a chord to a next chord by just changing one note at a time 
no more than two semitones. You can find ways to get from E G# B to Bb DF. 


If you are going to try voice leadings, you could first write both bass and soprano lines 
until you get something that sounds good, then see what the inner voices might be (a 
lot of very old music used this + some rules to help). 


You can try other cadences. For example you could try IV to |. This would be Eb to Bb, 
so how can you get to Eb from E? (Lots of ways, try some). Using the above ideas you 
could even do E E7 E7-5 Eb 


| quite love the iv min 6 to | (it is a slight of hand one). You could notice that Eb minor 
in tempered tuning is the same as D# minor, and this is the iii chord of B. D# minor is 
also the relative minor of F# (which is the V of B, and the V of V of E). 


Just to pick an example: | think the first section of 17 bars in Faure’s Requiem is 
cosmically sublime. His progression over the final 5 bars of this is: 


Eb/Eb F7/Eb | Gb/Db Db7/Db | F/C | A7/C# Dm6/D | A 


Notice that this is the modulation that the question asked for, given as a “path to 
transcendence” by a composer whose brow was touched by angels. 


‘ ——— 4 { ——— 
wee Ve ee ee ee : 
eS 
(os is__? ————s 

| —+ Ys — = ——————7 
SSS SSS 
’ | Pp » - on 
lx, — foes tex iss —#_f 2-4 
| > ’ « t+ ° 


It's worth noting the voice leading of the top and bottom lines. The target in both 
cases is the A. The top line is reaching for it. The bottom line is in contrary motion 
until the top line reaches the A, then the bottom line also reaches upwards ending 
with an “Amen” 4th interval. The altos get to do the magical 6th. The tenor line is also 
worth noting. The dynamics taper to a hush. The whole effect couldn't be more 
graceful and awe inspiring. 


| can't resist pointing out one more wonderful idea here. In the 3rd bar from the end 
there is an F/C, and then a first inversion A7/C# which really “wants to” go to a D 
chord of some kind. That an A major will be the final target almost immediately is 
completely hidden. | used the word “graceful” above, and | think this is the epitome of 
grace in more ways than one. 


Also take a look at Daniel Michalak’s comment below for more illumination about the 
voice leading. 


Transpose these five bars up a semitone, and it will go from E major to Bb Major as 
the question requested. 


So: there's a combination of existing lore and examples, plus a variety of music 
theories (not just tonal harmony) that you can use, plus noodling a lot, all of which can 
be combined to good purpose. 


There are lots of perspectives on this — my personal opinion is that “voice leading is 
the master principle” and thinking along these lines with some of harmonic theory 
whispering will turn up lots of beautiful pathways. 


€ Alan Kay 
<1 Still trying to learn how to think better - Updated 4y 
How was the Xerox Alto done in only 3 months? 


Basically: Chuck Thacker was simply amazing. 


| recounted some of the history of the Alto in “The Early History of Smalltalk" @, and 
both Butler Lampson and Chuck also wrote histories of that time that included 
accounts of the Alto development (all can be found via the ACM, and | think they are 
online also). 


It was part of the ARPA research M.O. to “compute in the future rather than in the 
present” by either using existing supercomputers, or by making a kind of 
supercomputer. There was enough science and engineering substantiation to allow a 
reasonable faith that the supercomputers of the present would be the commodity 
computers of the future. Beside needing to try out ideas and designs in many areas, 
there was also the brute fact that inventing and implementing new software ideas 
took quite a while to do — with really big dreams, a 10-15 year horizon was needed. 


I've written elsewhere in Quora a few things about the Alto, so | won't repeat here. 
Was the Xerox Alto a prototype or a finished product? 


To what extent did the Xerox PARC Alto inspire the Apple Macintosh? 


I think | might have also written about “MAXC", the emulated PDP-10 that was the first 
project done by the computer researchers at Parc (in 1971). Xerox wouldn't allow us to 
buy one (it was in the process of becoming the standard TSS computer on the 
ARPAnet) — instead they wanted us to use a Xerox Data Systems computer — so 
everyone almost walked out right at the beginning. 


We had in Butler Lampson very likely the best OS designer in the world — so we could 
do our own OS on the XDS machine — but even really good people need 3 or so 
years to do a good and reliable OS, so that was out. Then the discussion turned to 
“but we could make an emulated PDP-10 in less than a year” — and — “we could use 
and learn about the new VLSI DRAM chips from Intel", etc... 


Every thing about MAXC (Multiple Access Xerox Computer) was quite different than 
the DEC PDP-10 (including being much more reliable). This project was headed by 
Chuck Thacker, and required most of the computer researcher resources then at Parc. 
The result — in just 9 months — was both a successfully running mainframe machine 
plus technological leverage, especially with regard to the new DRAM. 


In parallel, the idea was floating that we would do some kind of a personal computer 
look-ahead to have an “Interim Dynabook", to explore serious Uls and local 
applications, to try distributed computing using local area networks, etc. We decided 
that it would be worth sacrificing RAM to have a dynamic bitmap display (because 
then we could show “anything"). We wanted “printing quality fonts” and bitmap 
painting. | wanted dynamic 2.5D animation and 12 polytimbral voices of music 
synthesis, etc. We built HW in early 1972 that allowed simulation of all five of these 
aims. This gave us a pretty crisp understanding of what kind of HW power would be 
required. 


We knew that we wanted the microcode of the computers to be able to efficiently 
emulate byte-codes as fast as main memory could cycle — because we were in the 
process of inventing a variety of new languages that would be implemented this way 
(Smalltalk, Mesa, etc.). The parties involved decided in the summer of 1972 ona 
scheme as to how that could be done nicely. 


We had all agreed that we would always “engineer for 100 users", so if we did a 
personal machine, we had to be able to manufacture 100 of them (in fact, close to 
2000 of them were eventually made). 


The actual project wound up being started — in late Nov 1972 — secretly because of 
some politics, and — as with a few other projects at Parc — as the result of a bet: 
Chuck said that a futuristic computer could be done “in three months" and a Xerox 
exec bet him a case of wine that it couldn't be done. 


Even though the DRAM was now understood, and the prior simulations established 
figures of merit for everything, there was still a lot of work to be done in just 3 
months! (For example: designing and building and debugging the first machine!) 


The Alto was put together partly using wire-wrap technology in one 19 inch rack. 
Most of the boards were DRAM (128K bytes organized as 16 bit words), and a very 
fast couple of processor boards which held the 16 program counter zero-overhead- 
task-switched microcoded CPU and register file. The backplane was also wire- 
wrapped. 


Most of the detailed design and much of the building was done by Chuck*. He had 
two technicians to help (Larry Clark and Mike Overton). Ed McCreight (another of the 
truly amazing multitalent computerists at Parc, designed the disk interface). 


Three months later: voila! We used to say: “Chuck just threw the parts at the wall and 
they fell down Alto!” 


From left to right: Larry Clark, Chuck Thacker, moi, Ed McCreight, and on the screen 
the Alto’s first image, a bit-map painting of the Cookie Monster on “Bilbo”, the first 
Alto. (Mike Overton, not pictured) 


| p 


| 
| 

| 

} 
, 


A closer look at Bilbo at birth, the Cookie Monster, and the board monster 


The Alto could easily emulate 12 high quality polytimbral voices in real-time (and 8 
FM voices), and could handle two keyboards and pedals organ console input. 


The Alto could animate about 120 square inches of bit-map images at 10 frames per 
second in 2.5D. 


The happy user in his office at Parc. 


Chuck with his 2009 Turing Award bowl. He is one of those special people you can't 
praise too highly in all respects of his life. 


We were all shocked when Chuck went away and left us in 2017. It is a vast 
understatement to say that we miss him greatly 


* From the Alto Hardware Manual: 


“The Alto was originally designed by Charles P. Thacker and Edward M. McCreight, and 
was based on requirements and ideas contributed by Alan Kay, Butler Lampson, and 
other members of PARC's Computer Sciences Laboratory and Systems Science 
Laboratory” 


€ Alan Kay 
<1 Have designed a few programming languages - 4y 


In object-oriented programming, why is it bad practice to make data 
members public when the get() & set() public members modify it anyway? 
If you use setters, you are not really working with objects, but glorified data structures. 
This use of objects as “Abstract Data Types” is not in the spirit of Object Oriented 
Programming and Design. This misuse is one of the biggest misconceptions about 
what OOP is all about — and it also removes much of the power of organizing 
modules whose contents are sealed off from the outside. 


One way to think about “objects” is that each is an *idea* that holds possibly useful 
behaviors that can be *requested* (not ordered). So they are like servers with 
differential privileges. Since there are almost always many more ways to accomplish 
concepts than there are concepts — think of the *idea* of “sorting” vs the many ways 
to do sorting — it makes great sense to separate the "language of 
concepts/behaviors” from the ways to accomplish the concepts/behaviors. 


This allows many kinds of scaling and reformulations to be done both safely, and 
while the larger system is running (hint: your systems design is poor if you have to 
stop it to fix it or change it). 


Another way to look at this is that “data” in most imperative and static senses of the 
word needs to go away — it is much too fragile and exposed. 


You can request services, and some of these might be as simple as asking for a 
particular value. If the server is representing a “person” then you might request “date 
of birth” and get a date object as a reply. If you request “age”, you would expect to 
get a duration object back. In the first case the date object might be held directly 
internally (probably not), and in the second case, the duration object is the result of an 
on the fly calculation (and which might be “continuous"). 


A more sophisticated use of objects would be to have a “goal pool” of things the 
larger system needs done, and the server objects could be constantly looking at the 
goal pool to find things to do. (This is a kind of “Publish and Subscribe” — or Linda — 


type process.) 


Etc. 


Alan Kay 
1 Still trying to learn how to think better - 4y 
Why are guitars tuned with an open B when a C seems more logical? 


An interesting range of answers in the previous 9. It's worth going back in time before 
the guitar to see how lutes, vihuelas, gambas, etc were tuned. 


Ans: all in 4ths except for one major 3rd somewhere — and as noted in some of the 
other answers: 4ths because long scale lengths make playing chords in 5ths tuning 
tough, and a major 3rd somewhere to collapse just enough to make more chords 
more doable and (perhaps) to wind up with a double octave on a bass note. 


The Renaissance lutes and vihuelas with 6 strings (actually “courses” — a “course” is 
usually a double string) have a tuning that is like a modern guitar except with the 
major third a course lower. Typical tuning from low pitch to high, with the major third 
between F and A: 


GCFADG 


Around the end of the 16th century, more bass courses were added. These were 
usually tuned to help provide bass notes for particular pieces. But a very popular 
addition was a D a 4th below the low G: 


DGCFADG 


Note that this actually “provides a guitar” tuning, reading from left to right plus an 
extra top string. Lute players who are also guitar players (or vice versa like me) often 
take advantage of this to easily play guitar music in this tuning (and note, not too far 
from actual guitar pitch). 


Further bass courses were added to the Renaissance lute (and there were alternate 
tunings also, etc.) 


Around 1650, the French invented the "Baroque Lute” with a very different tuning on 
top: basically D minor chords: 


(Top 6 courses from low to high): AD F AD F 


The logic here was to allow dissonances and close chord voicings. There is an 
enormous amount of lute literature written for the Renaissance and Baroque tunings 
(estimate is about 30,000 pieces). Almost all of it is written in tabulature (so you need 
a particular tuning to be able to read and play). 


Baroque lutes had lots of bass courses (up to 8 more) and these were tuned in a scale. 
So a typical tuning from low to high would be: 


FGABbCDEFGADFADF 


The archlute would often have the same bass courses scheme, but with the top 6 
courses in Renaissance tuning (this is a very useful wide ranging instrument). 


The guitar came along in fits and starts, with a 4 course version in the 16th century, 
and in Baroque times, 5 courses. These often were tuned more like ukeleles, but 
sometimes with a top course an octave lower. Again, 4ths with one major 3rd. 


There is a repertoire for these, but a lot of the use was to play chords to back up 
singing and other instruments. The guitar with 6 single strings started appearing 
around the end of the 18th century, and was in full use in the early 19th century. Many 
of these instruments were fitted with extra bass strings as with lutes. What we think of 
as the modern classical guitar happened around 1850 with a number of design 
changes by Torres: larger bodies, a different form of bracing, etc. 


é Alan Kay 
“1 Have designed a few programming languages - Updated 4y 


What does Alan Kay think of using either Squeak or Pharo as the first 
language for introducing programming? 


Helping someone learn something is a deep UI situation, and it has to start with the 
first word in UI. 


| think of the process as being quite similar to teaching a musical instrument (which | 
used to do), and the best thing one can do in the very beginning is to get them to do 
even the smallest “real playing” they can do. What this is depends both on who they 
are and the nature of the instrument. This is very similar to the “bounce-hit” ideas of 
Tim Gallwey’s “Inner Game of Tennis”. His point was that “beginning tennis” is mostly 
“chase the ball”, so the question is how can you get an absolute beginner to “hit the 


ball” 90% or more of them right off the bat. 


In other words, you want to structure things so that the learner starts as a kind of “low 
intermediate” with actions that bypass the fumbling and fright of “a beginner”. 


For 9 years old and up, a very good beginning is to use Etoys and the "Draw A Car and 
Drive It" first project and do it one-helper-to-one-learner. In about 20 minutes 


something quite fun gets done, and about 30 things are learned, about: objects, code, 
variables, loops, multiple processes, etc. 


There are a lot of considerations in UI, and the UI of teaching and learning, but if you 
just wanted to pick one focus to help design, it should be “reduce cognitive load”. One 
of the ways to do this is to use the screen real estate to augment the short term 
memory of the beginner, and show them all the ingredients, and have room to work 
with the ingredients all in “an eyeful”. 


The standard Uls for Squeak (and Pharo | think) are more organized for professional 
programmers who already know how to program and want to create systems of 
medium to large scale. The use of the screen is aimed in that direction much more 
than what I'd choose for beginners. 


| think in almost all cases, most languages have weak interfaces for beginners. One of 
the top UI principles is “do not make end-users suffer because of your laziness or 
ineptness”, so some real work would be called for to make a decent beginning 
environment for beginners. 


Bottom line is that | think there are enough actually important things that need to be 
gotten deeply when learning to program, that all other obstacles should be eliminated 
via making a pedagogical language for first use. It can be really powerful, but it needs 
to be in the mental space of the people who are trying to learn. 


€ Alan Kay 


1 Still trying to learn how to think better - 4y 


If you worked on pre-Unix time sharing systems such as ITS, Multics, 
TOPS-10, Tenex, or others, can you describe some of your experiences? 


By accident, | entered the U of Utah Computer Science grad school in 1966, and found 
that it was one of the "ARPA Research Universities” (I had never heard of ARPA, etc). 


Dave Evans had set up the department the year before when he moved back to Utah 
from having been head of the CS department at Berkeley (and of the ARPA research 
“Project Genie” there, which was aimed at making a very efficient time-sharing system 
on a relatively small computer (the SDS-930) by designing and making memory 
mapping and protection hardware, and a new kind of operating system to run it all. 


Two of the luminaries there — both students at the time — were central to this 
design: Mel Pirtle (who was the main HW designer), and Butler Lampson (the main 
designer of the OS). Peter Deutsch was another key for the design and the 
implementation. 


Project Genie turned out to be so successful that other ARPA projects wanted the 
machines also, and Bob Taylor — who was running ARPA-IPTO at the time — severely 
twisted the arms of SDS to make a commercial version of this machine with the Genie 
OS on it — the SDS 940. 


And this was so successful that — a few years later using the same OS design — BBN 
did HW and an operating system for the DEC PDP-10 that became Tenex. Later (1971) 
the MAXC “fake PDP-10" at Parc was created to run Tenex. And so forth. 


The SDS-940 machine was one of the main factors in the launch of the Time-Sharing 
industry. It is likely most famous and remembered for also being the machine on 
which Doug Engelbart and his group implemented the NLS system and showed its 
many impressive delights and approaches in “the mother of all demos” in Dec 1968. 
You can see this machine and OS at work in the online videos of the demo. 


One could also dial into Engelbart’s 940 and use it via teletype, and at Utah we made 
quite a bit of use of this — as a grad student, | even taught a class using this system 
(Ed Catmull was one of the students). 


Part of the reason teaching a class was so attractive was that one of the subsystems of 
Project Genie was CAL, an extremely efficient derivative of Joss (the first great end- 
user system that had been done at RAND in the early 60s). CAL was one of Butler's 
many projects, and it was an incremental on the fly compiler line by line, so fairly hefty 
projects could be done in it. 


The specs for the 940 should be a bit shocking today. The machine had a grand total 
of 192K bytes of main memory (as 24 bit words) and executed instructions at about 
0.5 MIPS. But could run 20-30 users quite handily. It was “designed to crash” — 
meaning that most OS software was designed “to work", and recovery from crashes 
could take quite a while. Butler's design OTOH did just the opposite: it spent some 
resources trying to “clean the kitchen” as it worked, so that a crash would incur 
minimal discomfort and could be recovered in a few seconds most of the time. 


Another really interesting wrinkle, is that Butler's design rewarded programmers for 
predicting their working set ahead of a time quantum (and punished them severely if 
they missed (by running their process on the slow queue)). This made an enormous 
difference in efficiency because the OS could line up the next 3 or 4 users’ working set 
ahead of time and not need to do random paging from random page faults (etc.) 


€ - Alan Kay 
* 1 Still trying to learn how to think better - 4y 
As a computer scientist, how much math do you actually use? 


Mathematics is a plural because many ways to “represent ideas to allow reasoning 
about them" have been devised. That little “s” also implies that more can be invented 


as needed. 


What this means is that sometimes one can use already devised "maths" to help in 
computing, and that sometimes one has to be able to invent new maths to help with 
computing. 


| don't think a person can be a real “computer scientist” (where we take the term 
“science” seriously) without being fluent in maths, including being fluent in being able 
to invent new maths when needed. 


| suspect that a lot of the disappointing computing we see around us has been 
concocted by computerists who haven't really gotten themselves invested deeply 
enough in the real field (to the extent that there is one). 


€A Alan Kay 

<1 Have designed a few programming languages : Updated 3y 

What is, in your opinion, the best textbook for self-studying computer 
architecture and programming “from the bottom up” (where the bottom is 
the logic gates level)? 

This is a great question! And I've periodically wanted to know the answer over the 
years — especially when I've wanted to teach “a whole systems” course. 


I'm in London for a few months more and away from my main library, so | can’t list the 
books I've found in the past. | can say that none of them lived up to the 
corresponding book in molecular biology: “The Molecular Biology of the Cell” by 
Bruce Alberts, et al. 


THE CELL 


Over a 1000+ pages, this takes the reader from elementary chemistry all the way 
through how life works, in enough well written detail to be understandable. 


There are still a few supplementary books that help a lot (for example: “Cell Biology By 
The Numbers" by Philips and Milo). | think this would be OK for the current question. 


Since | haven't looked for a few years, my plan here is to buy and read all the 
suggestions made by the other answerers, and hope that a few good answers turn up. 
| will then report on them. 


What I'd like to see is a book that starts with the many ways to make logic gates — as 
Danny Hillis’ “The Pattern in the Stone” does — and then moves to abstractions — 
such as the “stick diagrams” in Mead and Conway's “Introduction to VLSI Systems” to 
sketch out simple memories, addressing schemes, data paths, etc. for something like a 
simple RISC architecture. The control for this could be a simple microcode (and later 


more microcode could be added in a separate memory). 


We want to get to software as quickly as possible so the hardware design here can be 
a lot of fun to choose just what will help SW without itself getting too tricky. 


Then we could “bootstrap a bootstrapping system” — could be a simple virtual 
machine in which symbolic processing to make a language could be done. And this 
could be used to write an operating system kernel that is expressed as the basic level 
objects in a protected object-based system. 


From here it would be easy to show how a modern UI and end-user tools could be 
made in a few thousand lines of code. 


This gets at the essence of the “chain of being” without having to delve into the 
needless and enormous complications of modern hardware and software, most of 
which is quite superfluous to both understanding and even most practical usage. 


(Edit on May 18th 2020) A few days ago | got asked questions like these by a CS 
department in the US, and told them that the Nissan & Schocken book couldn't be 
too highly praised in the amount and quality of effort put forth and the high quality 
results they got. 


And there is also the Patt & Patel book, which is also an excellent effort and result. 


That said, today | think | would engage a faculty to rethink these books along several 
lines, and get their students to help — probably over several years of effort — to try 
to get to the next level of explanation. 


This is because there is a difference between justly giving cosmic praise and really 
loving the result (e.g. you really can't praise Simula or Lisp too highly, but the world of 
programming needs even better languages than these wonderful creations). 


For example — compared to Nissan & Schocken — | would certainly have the ultimate 
SW targets be different and much higher level. | think there is enough experience and 
lore to really pull this off. 


On the HW side, | (personally) would need a few months to think this through. Maybe 
there are two versions. 


One would be the absolute minimum as far as architecture goes — but superfast with 
an enormous memory — to get SW going. This is the “tiniest Turing machine” idea. 


The second would be an example of a minimal-but-practical architecture might mean 
today (today's equivalent of Chuck Thacker's Alto at Xerox Parc) 


€ Alan Kay 


Have designed a few programming languages - 4y 


Why aren't there any books about the history of modern computer 
programming (like 1970 onwards)? Most books seem to delve really far 
back and only just touch on the present and future of programming. 

| take your question to imply “good books” (there are plenty of pop culture books 
about microcomputers onwards). 


The best book about Parc is actually about the ARPA research community started by 
JCR Licklider — "The Dream Machine” by Mitchell Waldrop — this has just been 
republished by Stripe Press. Parc was both an outgrowth and an integral part of the 
ARPA community, and this book is “pretty good” about providing a fairly good history 
and context for the larger endeavor. A “heroes journey” book about Parc is “Dealers of 
Lightning” by Michael Hiltzik. And there are others. 


For histories about programming per se, let me recommend the ACM “History of 
Programming Languages” series (now with 3 volumes). The first covered “dawn of 
history” until about 1970 or so and was published in 1978. The second was published 
in 1996. Etc. The ACM selected the languages, and did a good job bulldosing the 
inventors of the languages to explain how they came about. 


Some of the most interesting languages did not get covered (the ACM criteria was 
that they have been in use for more than 10 years) because they were exotic or were 
ignored for other reasons. However, Jean Samet looked at a lot of programming 
languages in her book. 


Perhaps more to your point — which asks about “programming” — | don't know of 
many books at all — or even papers — which deal with the history of e.g. 
“programming large systems”. There's “Mythical Man-Month” by Fred Brooks about 
programming OS 360. | think Howard Sturgis wrote a paper about the trials and 
tribulations of CAL TSS (an OS for the CDC 6400 at Berkeley). 


| have seen several books about the history of Unix (which started in the early 70s 
about the same time as Parc started up). | don’t know how good these are. 


But, there's Xinu (Xinu Is Not Unix) which is a small Unix-like OS that is completely 
explained in several books, including how to build it. 


There's the Smalltalk "Blue Book” by Goldberg and Robson, which includes a hefty 
section on how to make a Smalltalk from scratch. 


Alan Kay 
Still trying to learn how to think better - 4y 


What is the nearest modern equivalent to Bell Labs or Xerox Parc? 


| take the question as being mostly about “similar process” rather than particular 


topics of focus. 


| haven't found one that has the same feel. The two today that are worth looking at 
more closely are Janelia Labs and the Max Planck Institute. | know the least about the 
latter, but have heard very good things about their process. 


Most of the others mentioned in the other two replies so far are much more top down 
and “directed” than Xerox Parc was (and most of the much larger Bell Labs was). 


Further side points to ponder are that some of the labs today which don't have a 
similar process might very well have started with the same process. A small growing 
organization can add new and crazy people and groups. There is usually a size limit, 
and when that is reached, the only way you can continue “new and crazy” is to nudge 
the older groups out (this might be a bad idea). Also, there is the problem that the 
very process that jells a group to be highly cooperative, often establishes de facto 
leaders who might not be best for the next phase of “new and crazy", and in any case, 
they start looking a lot like management even when they are trying to avoid it ... 


Gé& Alan Kay 
‘Still trying to learn how to think better - 4y 
Why does technology tend toward complexity? 


In talks, I've made the distinction between “Complex” and “Complicated”, where 
“Complexity” is an intrinsic property of an organization of “ingredients” to satisfy a 
goal, and “Complication” is extrinsic — mostly due to human poor design, execution, 
etc. (This is not an original idea, you will find other versions of this bifurcation.) 


The aim of “architecture” and “organization” is to have the result be able to do more 
than the ingredients. 


Note that — because of our nervous system — we can sometimes get “more” even 
when the ingredients don't themselves interact. For example, we can play separate 
sine waves whose frequencies are integer multiples and perceive a complex timbre, 
even though the ingredients are completely linearly independent. Here the 
“complexity” is mostly within us (and for useful evolutionary reasons). 


Most of the time, there will be interactions between the ingredients, often the 
communications paths will form feedback loops, and the results will be non-linear, 
often both literally and figuratively. The difficulties in predicting what such systems will 
do is also part of the measure of their complexity. 


It's worth noting that we plus our tools can often reduce the apparent complexity (if 
we have a real-time FFT handy, we can see much more about how separable the 
frequency ingredients of a sound might be ...) 


The real costs of “complications” in software are rarely incurred quickly, so these have 
been allowed to get out of hand in ways that real engineering in the physical world 
tries to deal with and squelch early in the game. The ease of linking and patching and 
storage capacity in SW allows a kind of “brinksmanship” that is appalling, and has 
opened the door to poorly prepared practitioners. 


Bottom line is that “complexity” is what we want to be able to handle, and we will have 
more brain cells for this if we do everything possible to limit “complications”. 


€ Alan Kay 

<< Have designed a few programming languages : 4y 

Is there an OOP approach that uses a "milieu", not an object graph, to send 
messages, similar to how my voice is carried through the air and hits people 
who happen to be in range, rather than being addressed to a specific 
person? 

A good early event-driven system was the “Sol” programming language by Knuth and 
McNeely (ca early 60s). 


You can take a look at a number of locational “particles and fields” object 
programming ideas. 


For example, suggested in my 1984 Scientific American article @, and used in the 
Playground language (which had "near" as a property) a few years later. 


A nice version was in Mitchel Resnik’s StarLogo, that used “patches” as locational 
environments. This was later used in “Wandering Letters” and other particle and fields 
systems, and adapted for the Etoys “Kedama” particle system. 


John Maloney’'s new “GP” language (a wide spectrum up-design for Scratch) has 
location sensing. 


Etc. 


€ Alan Kay 
| Have designed a few programming languages - 4y 


Does Alan Kay see any new ideas in computing? 


“New” is not what | look for. "Ideas that make a qualitative difference over past 
techniques” are what I'd like to see. 


Years ago, I'm fairly sure | was aware of pretty much everything regarding computing 
that was going on in the world. Today, I’m definitely not aware of everything, so it's 
reasonably likely that if there was something really great being done somewhere that | 
wouldn't know about it. 


| would be most interested in learning about “qualitatively more expressive” 
programming that is more in line with top-level engineering practices of the 
CAD<->SIM->FAB systems found in serious engineering of large complex systems in 
the physical worlds of civil, electrical, automotive, aeronautical, biological, etc. 
engineering. 


In the CAD<->SIM part I'd like to see the designs understandable at the level of 
visualizable semantic requirements and specifications that can be automatically 
simulated (on supercomputers if necessary) in real-time, and then safely optimized in 
various ways for many targets. 


Isolating semantics in the CAD<->SIM part implies that what is represented here is a 
felicitous combination of “compact and understandable”. 


The FAB-part pragmatics are very interesting in their own right, and besides 
efficiencies, should be able to deal with enormous scaling and various kinds of 
latencies and errors, etc. 


The above would be the minimal visions and goals that | think systems designers 
within computing and software engineering should be aiming for. 


I'm not aware of something like this being worked on at present, but these days this 
could be just because | haven't come across it. 


€ Alan Kay 


Had something to do with "Object-Oriented Programming" - 4y 


Are shallow inheritance hierarchies better? 


In English we often say “This is like that except ...” and most of the time this is a plus if 
there is a good mapping between the two ideas and the “except” doesn't destroy the 
mapping. In inheritance we are in theory saying something safer: “This comes from 
that via ..." where the initial mapping is a form of equality or parametric equality, but 
where the same care has to be taken with the “via”. 


If there are no “guardians of meaning” in our system, then we can hack our way into 
something as messy as we were trying to fix, and perhaps messier in a new and 
pernicious way. 


This is similar to using an assignment as “equality” — hugely useful — but if we don't 
guard it in some way we can destroy the idea of “equality”. 


Since the “spirit” of inheritance can be easily violated in most languages with a shallow 
hierarchy, my answer is that “shallowness” could be more helpful in understanding or 
avoiding a potential mess, but that’s all. 


On the other hand, | like useful abstractions a lot, so we might better try to think 
about forms of “analogical descriptions” or “differential descriptions” that pay their 
way in clarity and ease of use. These issues are ontological, and it's worth looking at 
what ontologists have been able to do with representing ideas, especially with enough 
constraints to preserve the original comparisons and derivations. 


The first good attempt I'm aware of to do inheritance in comprehensive ways that 
might have some clarity was the “Flavors” system in the first versions of Lisp for the 
MIT Lisp Machine. This had “operations’/"relationals” that provided a bit of a logical 
and algebraic feel, especially for multiple inheritance. (And provided the opportunity 
for nightmares of obscurity if hacking brinksmanship were done — which turned out 
to be the case.) 


One way to think about this is that one of the most important issues to try to control 
well is “dependencies”. | think that most of these need to be controlled semantically 
and automatically — so something like the idea of “set of support” and “truth 
maintenance” in several of the better “expert systems” systems in the past (such as 
ART) where changes would cause ripples through the system to make sure that all 
dependencies remained valid. 


| wonder if there are any great examples these days in modern programming 
languages? 


€ Alan Kay 
Still trying to learn how to think better - 4y 


What are your thoughts on the free software movement? 


| think it's very similar to “free speech”. It's important to allow, but the equivalents of 


yelling “fire” in a theater should be curtailed. 


Another facet of this these days includes what you can do with biology on a kitchen 
table with about $10K-$20K worth of equipment. 


Or what you can make with 3D printing that wasn't at all easy to make previously. 


Vi Hart said "We must ensure human wisdom exceeds human power". We aren't 
putting forth the enormous societal effort needed to do this. 


The consequences of anti-social actions continues to get more and more life critical. 


€ Alan Kay 
<7 Still trying to learn how to think better - 4y 


What makes for a great software demo? 


Presentations are “theater”, and always about the audience. 


The vast majority of human beings are “instrumental reasoners” (meaning that they 
mostly judge on how an idea or tool contributes to their current goals). Only a small 
percentage will be interested in the idea or tool as the possible start of new goals. 


This means that showing “analogies” and “capabilities” are likely not to register as 
positive. Most people want to see more direct relationships to what they want to do 
and are trying to do. 


Marketeers try to home in on and sell to what people want — where the “customer is 
always right”. Educators try to home in on what people actually need — and people 
often don’t want what they need — so the "customer" is often rarely right. 


So if you are trying to educate rather than sell — and this is what “good theater” is 
usually about — then you have to put in a lot of effort to get people interested in 
ideas, and to get out of their current simple “heat seeking” behaviors and judgements. 


McLuhan was famous for not trying to get an audience to agree with him: his goal in a 
talk was to just get them thinking at all! This is actually a good ploy (though 1'll 
confess that | usually have didactic aims beyond just waking an audience up). 


The 1979 demo of the Alto and Smalltalk we gave to Steve Jobs worked to the point 
that it did because (a) he was looking for something better than what they had, and 
(b) every time he asked a question or wanted a feature, Dan Ingalls and Larry Tesler 

were able to make it happen in a few seconds right in front of him. 


However, as he mentioned some years later, he missed thinking about why Dan and 
Larry could do all of those things in a few seconds, and why the system could do what 
it did in general. And he missed the Ethernet, etc. He got what he wanted, but he 
needed more. 


| think this sums up what was — mostly — a pretty effective demo. 


é Alan Kay 
1 Still trying to learn how to think better - 4y 


Does Sequoia Capital have enough cash from its investments to create a 
modern day Xerox PARC? 


The old Don Valentine company that funded Atari, Apple, and many others ... 


The computer part of Xerox Parc was relatively small — Butler Lampson has estimated 
that the many major inventions were done by about 25 computer scientists (and this 
would imply another 25 in support). So about 50 people. 


Let's double that to deal with both the eventual size of the computer part of Parc 
(about 40 computer scientists), so let's say about 100 people. 


Back then the costs were different (e.g. the inflation in housing was another factor of 
10 times regular inflation). Salaries were also a bit proportionately lower. But hardware 
costs were quite a bit higher — both in general, and also because Parc bought its way 
almost 15 years into the future by making its own personal supercomputers and 
anything else needed to “compute in the future rather than in the past”. 


I've estimated that the yearly costs for something like what we had in today’s dollars 
were about $12M-$15M. 


If we expanded this to $20M to $25M per year in our estimate here, we should be able 
to fund a lot of capability. 


So a better way to pose your question might be: “Are there any Fortune 500 
companies that couldn't afford to set up a Xerox Parc?” 


Or: “Are there any governments in the world that couldn't afford to set up a Xerox 
Parc?” 


Or even: “Are there any major universities that couldn't afford to set up a Xerox Parc?” 


le. If you restrict your head count to just the very best, and support them, and there's 
a great vision they can feel, and they can work together and find their own goals and 
problems from the vision ... wow! 


€ Alan Kay 
Still trying to learn how to think better - 4y 


Did early computing necessarily have to be in binary, or could there have 
been another completely different concept that would have evolved into 
the technology we enjoy today? 

Originally Answered: Did early computing necessarily have to been binary, or could there have been 
another completely different concept that would have evolved into the technology we enjoy today? 
Babbage’s “Mill” (the CPU of his engines) — along with other arithmetical computers 
before and after — used gears, and worked in base 10. These schemes were “digital” 
— meaning they represented discrete values. Since the gears turned, the apparatus 
was “analog beneath”, and part of the apparatus had to ensure that the values were 
eventually quantized. (And consider the several roles of the escapement mechanism in 
a clock ...) 


There were also quite a few analog computers made over the years, especially towards 
the end of the 19th century and well into the 20th century that used mechanical or 
electrical or combinations to represent quantities plus or minus a bit of noise. 


One of the many difficulties with noise is that it can often build up to the point where 
the quantity is no longer a reasonable approximation. Making a system that deals in 
discrete values helps a lot, because you can reset communications from each stage 
(one way to think about it is that quantizing a signal reshapes it into something 
standard). 


So you could easily make a communications system that transmitted 10 voltages 
spaced widely apart enough so that which one was intended could be recognized by 
the receiver and reshaped to the standard value. You could make operational 
hardware to add, subtract, etc these. This would be “digital decimal”. 


However, you could also take advantage of the fact that you only need 4 “digital 
binary" values to cover the values from 0 to 9. And 10 binary digits will represent up to 
roughly a 1000, and 20 to roughly a million. 


So: if making “binary digits” allows much deeper engineering and ultimately simpler 
systems design, then it's worth concentrating on a completely binary system. 


And, we don't have to give up on analog ideas either, because we can use sampling to 
represent any waveform at any desired accuracy (even far beyond the ability of any 
analog system to stabilize). This is what happens these days with sound recording and 
reproduction (if it is really done well, you get the best of both worlds). 


€ Alan Kay 
<< Still trying to learn how to think better - 4y 


How do you tune the 1st string of the guitar without a tuner? 


An old trick, which may not work so well today, was to use a fluorescent light* as a 
reference. 


They used to emit a nice 60Hz hum (actually one or two octaves above this depending 
on the fixture), and this is just slightly sharper than a Bb (the Bb is a little above 58H7z). 
You can tune one of the Bbs on your guitar to this (e.g. the 6th fret on the E string) 
and tune the rest of the guitar from the open E. 


*| should also have mentioned that the average guitar amp still puts out a bit of 60Hz 
hum, so this could be the go-to solution today. (On the other hand with a computer 
or smart phone you can ask any Internet browser for any pitch ... so that could be the 
gazillion transistors solution to this problem ...) 


Alan Kay 
<7 Still trying to learn how to think better - 4y 
Is computer science distinct from the liberal arts? 


Let's first collect a few opinions on what constitutes “liberal arts”. 
Wikipedia says: 7% 


e Liberal arts education (from Latin @ liberalis @ "free" and ars % “art or 
principled practice") can claim to be the oldest programme of higher 
education @ in Western history. It has its origin in the attempt to discover 
first principles — ‘those universal principles which are the condition of the 
possibility of the existence of anything and everything’. 


¢ The liberal arts are those subjects or skills that in classical antiquity @ were 
considered essential for a free person (liberalis, “worthy of a free person") to 


know in order to take an active part in civic life, something that (for ancient 
Greece /) included participating in public debate, defending oneself in 
court, serving on juries, and most importantly, military service. Grammar @, 
logic @, and rhetoric @ were the core liberal arts (the trivium @), while 
arithmetic %, geometry [%, the theory of music %, and astronomy &% also 
played a part in education (as the quadrivium /). 


¢ The Liberal arts today can refer to academic subjects such as literature /%, 
philosophy %, mathematics (%, and social % and physical sciences 7 


Many universities follow this general scheme, e.g. “the engineering disciplines” are 
usually placed in a different division of the university. 


And if we take the early notions of “computer science” as put forth by some of the 
founders of our field — Herb Simon, Alan Perlis, etc. — which take the word “science” 
literally as “understanding by making maps/models drawn from phenomena” — and | 
agree with them — then we would place “Computer Science” in the Liberal Arts 
division along with mathematics and the sciences. 


So, my answer to the question is “No”. 


The main good thing about this is that it encourages universities — and profs and 
students — to not just use "CS" as a term for many things that are more properly 
engineering pursuits than science pursuits, and perhaps to try to do a little more 
proper “computer science”. 


However, I've always felt that there is much too much stovepiping going on 
everywhere, and that “liberal arts education” — broad and useful as it is — should be 
even broader. This is not just because important subjects like “real science” have 
important engineering parts to them, or that it's not possible to do engineering in 
most areas without getting fluent in science. 


For one thing, it is possible to get a liberal arts degree today without getting even to 
the first stages of fluency in science or math. This seems quite wrong, if only from the 
standpoint of the emphasis on important and central ideas for humanity. 


As a much larger example, today | don't think that ‘those universal principles which are 
the condition of the possibility of the existence of anything and everything’ can exclude 
modern engineering and allied fields, because the latter have brought a lot of 
important new things into existence, and therefore have shown that they exist, and 
therefore the processes and principles, etc. should be added as a proper focus of 
philosophy, etc. 


| believe | partly have this opinion because — via getting an “A in unpopularity” with 
my Junior High School teachers — who told me that | had better pass the test for 
Brooklyn Technical High School and “get out of here, or else!". So | did, and did. 


This led to a regimen that was about 6 years of work stuffed into 4 years, in which the 
regular HS academics, more science and math, and a four year engineering curriculum 
were jammed into long days with 6000 other boys in an enormous block long and 
block wide 8 story building. “Electives” were so elective that you did music, sports, etc, 
after school hours: this actually worked quite well albeit requiring longer days. 


This was one of those experiences which was quite illuminating and helpful after it was 
over with. | think most of us would agree after, that the extra work elevated every part 
of the work. 


One of the surprising things | found when | got to college is that they didn't make you 
work very hard to get a degree. | had gotten used to Tech so | wound up with 2 major 
concentrations and 2 minor concentrations (and | was very happy that there was some 
time left for working on theatrical productions in the excellent theater department). 


Still, even back then in the early 60s, | felt that the universities should be more heavy 
handed about getting students to get fluent in “humans in the large and civilization in 
the large” as a necessary part of a modern education. 


| fear that this is even more complicated today, given that (a) high schools seem to be 
accomplishing less, (b) more and more time in early college years is more like 
remediation than new learning, and (c) even major universities today seem to be 
doing something more like trade school training for vocations than to carry out the 
proper processes for education. 


I'm not advocating setting up college like Brooklyn Tech! For example, each semester, 
besides math, science, English, history, foreign language, there were also at least 3 
engineering courses (of which one was “engineering”, another was “design and 
mechanical and industrial drawing”, and the third was some form of large scale hands 

n “shop” experience (like green sand casting, industrial machine shop, building a 
house or airplane). 


However, to just pick one illuminating example, the “engineering course” for all of 
Freshman year was “Industrial Processes”, which was in depth learning about how 
pretty much every main thing in 20th century society was made, and in quantity: every 
erg of energy, metal, plastic, wood, paper, petrochemical, pharma, etc. | found this 


fascinating from so many angles — and facing up to the industrial revolution in depth 
by looking at the hows and whys also provides grist for modern philosophical mills. 


Today | couldn't imagine setting up a two year “humans and civilization” course in a 
liberal arts college without including something like this, and a number of other 
important things drawn from outside the traditional liberal arts. 


Could you? 


@ 


What would Alan Kay suggest is the ideal way for somebody to start 
learning about computing at a corporate level? 


Alan Kay 
Still trying to learn how to think better - 4y 


After procrastinating for years | finally seriously took up classical pipe organ at age 40 
while Chief Scientist at Atari and then a Fellow at Apple Computer. As with most 
people of that age, | was already quite busy. 


Learning the pipe organ — is 2+ hours a day for quite a few years — with a ratio of 
2/3rds classical piano to 1/3 organ. Sometimes this would be just classical piano for 
many days and even months. 


| mostly did this by getting up really early in the morning to have the freshness, time, 
and solitude needed for the 2-3 hours a day | spent. (It was quite difficult.) 


The basic idea of fluency is to put enough time and focus into the learning process to 
have the new things “become you" as “you become them”. Part of the result is a 
different and fleshed out “intuition” about what you are trying to do to go along with 
the many skills needed. 


| was helped by having learned other instruments (jazz guitar) fluently when | was a 
teenager. This did not help with the mechanics at all, but did help with having a 
feeling for music, how it is written, what it means to play it, etc. There was no question 
that many parts of this learning were much more difficult to internalize than when | 
was a kid. 


Another perspective is that I've worked with many intelligent busy adults over the 
years, both in corporate and educational type settings, and have found that most of 
them, even with considerable desire, have not been able to put enough sustained will 
together to fit a really new big thing into their adult learning curve (I in fact was quite 
desperate about this for many years before | just did it). 


However, in an earlier period of my life | also taught guitar to a wide range of 
students, including adults. The adults had a real advantage over the kids by having 
much more perspective about themselves and their processes — they could decide to 
practice, and would practice. They were generally much more limited than the kids via 
how long it took to learn physical things, and their biggest barrier was often having a 
much stronger idea of a goal (which was almost always to be able to play a favorite 
song or two like one of their musical heroes). So it was hard for them to be happy with 
stuff that wasn't their target. | spent quite a bit of time making simpler versions of 
their target so they could feel they “were on their goal”. The kids on the other hand 
learned more easily but usually had real problems with practicing. The ones that didn’t 
have these problems just soared. 


A lot of the consulting and advisory boards etc. I've done and been on over the years 
have been for corporations, and two of the main problems are that (a) the higher-level 
executives don't understand computing, and (b) the employees who “do computing” 
don't understand computing. 


Both are really difficult to teach for the reasons above, and the additional difficulties of 
“(b)" of having gotten fluent in a weak version of the pursuit. Since humans are terrible 
at forgetting, new learning has to go over what is already there, and there can be 
deep conflicts and frustrations. 


And given this, it should be obvious that the “(a)” people have also learned things that 
will make learning some of the new things more difficult than they would be for a 
child. 


Still, a big plus — and the military has this also — is that if the corporation is 
interested in having its employees — especially the higher-level ones — learn about 
computing, then by fiat it can make the time available during the day as part of the 
jobs (and salary) of the learners. Quite a bit of time will still be required, but under 
these conditions quite a bit can be done, and perhaps rather analogous to what it 
would mean to have the goal to get fluent in a foreign language. 


And, in fact, there are examples of this in corporate history. My favorite, and | got to 
watch the process firsthand while being on the corporate advisory board, was the 
computer/systems-design training regime of the 32,000 employees of Andersen 
Consulting in the mid-80s. They were extremely serious about how they went about 
things, so they had bought a 4000 bed university (formerly a Catholic institution) in St. 
Charles, Illinois and used it to train their “Arthurs” in many things, and most especially 
in “Method/1” which was the strategy and tactics of design and implementation for 


their engagements as consultants with their clients. Pretty much every “Arthur” spent 
at least two weeks each year at St Charles, and in their earlier years, spent much 
longer. All of this plus exploratory research was done by the Technical Services 
Organization (run while | was there by the amazing Mel Bergstein). 


This was essentially “plug compatible commando training” and was tremendously 
successful. It missed much of what was really powerful about computing, but it was 
very compatible with their clients, and thus made a large and successful business. If it 
were music, it was about marching, playing marches, and composing marches. 


The TSO people were much more adventurous about the larger scopes and powers of 
computing, and part of the process | was involved with was to see what could be done 
about the larger issues of computing, especially give the resource of St Charles. (This 
also eventually involved getting the inimitable Roger Schank to bring his whole sub- 
department at Yale to Northwestern to help shape the new experiences (a different 
story for a different time)). 


One of the most interesting groups within TSO was the “object-oriented design” team 
that was based in New York. | think there were 30 or perhaps 40 of them, and this 
group had gotten very good at understanding *real* object-oriented design and 
implementation. This took them about 3 years, and they were supported and 
sheltered by Bergstein. (Another great story for another time was how all of this paid 
off with the still amazing Brooklyn Union Gas design and implementation during the 
late 80s.) 


I think it is much easier for a group of people to learn good versions of “real 
computing” together, and they can if they are given the time and the support by their 
company. The most critical parts of the process are how to introduce the really 
important things that need to be learned that will seem to be “off task" by both the 
company and the learners (this is because most “normal” corporate computing could 
hardly have a worse process or ideas about making systems). 


So, as usual, learning something new and important is almost always going against 
“normal” (which if you think about it in regular society, means “doing something 
crazy"). This is the biggest hurdle for improving computing in corporate organizations. 


€ Alan Kay 
1 Still trying to learn how to think better - 4y 


What classes did Alan Kay teach at New York University’s Interactive 
Telecommunications Program, UCLA, Kyoto University, and MIT? 


An elliptical answer in two stories ... 


In the Fall of 1966, via quite accidentally winding up as “grad student #7" at the U of 
Utah computing grad school — after undergrad in math and biology and working my 
way through college as a programmer at the National Center for Atmospheric 
Research in Boulder, Colorado — | discovered that it was one of the “ARPA Project” 
universities (which meant absolutely nothing to me as | walked into Dave Evans’ 
office). 


Many things hit me over the first few days which I've recounted elsewhere, but I've 
given less mention to the thrill of discovering that the main designer of the Burroughs 
B5000 computer — Bob Barton — had been convinced by Dave to move to Salt Lake 
and be a professor while he was consulting for Burroughs. 


| had learned enough about this amazing machine to want to understand the parts | 
couldn't make sense of. So | signed up for Barton's “Advanced Systems Design” course 
with great excitement. 


Bob Barton was about 6’ 3” or 4" with the face of an imp, a mind that ranged from 
impish to extremely impatient and depressed, a command of language and 
vocabulary rarely encountered, and one of the true geniuses of our field. 


ae 


He did not like students (nor most computer people, nor most people). 
He stomped into the first class of this course and handed out a list, saying: 


“There are a few things known about advanced systems design, and most of them are 
in these papers. | expect you to read them and understand them thoroughly. 


“But my job here is to firmly disabuse you of any fondly held notions you might 
have brought into this classroom.” 


The “course” was him finding out the many things that each of us believed and 
demolishing them. (Later | found out that quite a few of the things he demolished 
were also things he believed in!) 


This was one of the two or three greatest gifts any teacher has given me. 


The second story was told to me by my friend Neil Postman (the marvelous author of 
“Amusing Ourselves To Death", “Teaching As A Subversive Activity”, and many other 
wonderful thought provoking books). 


When Neil was a grad student (in English), he and another grad student decided to 
take off from school for a while and meet Marshall McLuhan — this led to them 
following him around from talk to talk. McLuhan didn't really sleep, so the two kids 
would sit up with him in his hotel room while he smoked cigars in bed in his dressing 
gown and just talked and talked and talked. 


Neil said they noticed when Marshall gave a public talk and was asked a question or 
was challenged on a point, he would just ignore these, and would say to the person in 
the audience: “Well, how about this one?” and proceed to utter more of his “zen- 
koan-like” slogans. 


After a few of these they realized that McLuhan didn't care whether the audience 
agreed with him or not. McLuhan cared about whether the audience was thinking 
at all. And his job — and his life — was to wake them up. 


| hope the point of these two stories is clear. One way to think about it is that anything 
that is good that happens in “schooling” is when a human being decides to put in the 

work needed to learn deeply enough so they wind up a different and better thinker as 
a result. 


(What then should be the job of the “teacher"?) 


Bob Barton and Marshall McLuhan both understood this. 


é Alan Kay 
A 1 Still trying to learn how to think better - Updated 4y 
What are the Seven Wonders of computer science? 


| love this question — in no small part because | don’t think a really correct/exclusive 
answer is possible. 


| have a particular set of meanings for the term “computer science” — the questioner 
might have meant to say “computing”. A few of my items might be taken from 
“computing” rather than from “computer science” per se. 


The things | have found to be astonishing and amazing (and shocking) are: 


1. Turing’s notion of machines that can simulate machines completely by 
interpreting their descriptions (exhibiting the programmable computer as “a 
language machine” and a “meta-language machine” — along with this is the 
simplicity of what is required to do so (a great book is Marvin Minsky’'s 
“Computation: Finite and Infinite Machines"). Turing’s approach is much 
more of a “real CS" approach compared to Goedel's earlier methods, and 
soon led to a large number of important next steps. 


2. <this one is tentative at this point> How simple (a) it is to design a whole 
computer from just one kind of logical element (e.g. “"NOT-BOTH"), especially 
when compared (b) to how Russell and Whitehead struggled to “bootstrap 
mathematics, etc., from logic at the turn of the last century. (This is one of 
those “Point of View is Worth 80 IQ Points” ...) 


3. Lisp, and McCarthy's general approach to “mathematical theories of 
computation” and having languages that can act as their own metalanguage. 
One of the great cornucopias of our field. 


4. Sketchpad by Ivan Sutherland for so many reasons, including: the approach 
to interactive computer graphics and the simulations of the graphic 
relationships, the “object-oriented” approach to definition and deriving new 
kinds of things (including “masters” and making instances from masters), 
enormous virtual worlds that are windowed on the display, the use of goal- 
directed programming with the system solving the simultaneous goals in 
real-time, etc. And more, including the demonstration that a simulated 
computer on a computer need look nothing like the underlying hardware or 
any “normal” idea of “computer”. 


5. The big Shannon et al. ideas about how to have imperfect things be 
organized in systems that are much more perfectly behaved even if the 
organizational mechanisms are themselves noisy. Includes all forms of 
“noise”, “representations”, “communications”, “machines”, etc. and poking 
deeply into Biology and how living things work. Nice implications for 
“stochastic computing” of many kinds which are needed more and more as 


things scale. 


6. The deep implications of “symbolic computation” (now a very un-funded 
area) for being able to move from the trivialities of “data” (no matter how 
voluminous”) to the profundities and powers of “Meaning”. This used to be 
called “Al” and now has to be called “real Al” or “strong Al” (it would be 
much better under a less loaded term: how about "Flexible Competence”?) 


7. The Internet. Certainly the best thing done by my research community, and 
the first real essay into the kinds of scaling and stabilities that all computer 
science should be trying to understand and improve. This was a great 
invention and development process in all ways, and — by looking at Biology, 
which inspired but we really couldn't use — it had a reasonable chance to 
work. That it was able to scale stably over more than 10 (maybe 11) orders of 
magnitude, as indeed planned, is still kind of amazing to me (even though it 
should have). Judging from most software systems today not being 
organized like the Internet, one is forced into the opinion that most 
computerists don't understand it, why it is great (and maybe don't even think 
of it as the fruits of “real computer science” because it just works so much 
better and more reliably than most other attempted artifacts in the field). 


>» Alan Kay 
‘Still trying to learn how to think better - 4y 


rt 


When is incremental progress good? 
“Better and Perfect are the Enemies of ‘What Is Actually Needed’ ". 


If WIAN can be accomplished incrementally without changes in Type, then there are 
many advantages to this, including low risk and high integrity of results. (Basically you 
want engineers to design the bridges and airplanes that will be used by the public, not 
scientists — who might be more interested in collapses and crashes!) 


There are two lurking bugs here, both important: 


1. If only “better” is accomplished without WIAN, then a pernicious process has 
been started, where “better” will eventually mean “worse”. The bug is 
confusing “better” with WIAN. 


2. If WIAN means “something qualitatively different” — a change in Type — 
then “incremental” will almost never get there — it is much more likely to be 
asymptotic to WIAN. The bug is confusing WIAN with the Type of what we 
have. 


Bottom Line Heuristic: Always look for WIAN as carefully as possible — and take as 
long as you possibly can to identify it. (The champ in my experience was our hero, 
friend, and colleague Paul MacCready - Wikipedia [). 


| hope readers of this will supply examples from their own experience (| have many) 
but for this answer it will be more fun and instructive to collect experiences. 


€ Alan Kay 
‘Still trying to learn how to think better - 4y 


Does Y Combinator's "100 happy users" advice originate from Butler 
Lampson? 


Eddie's answer is “plausible”. 


Butler did get us to agree to the "100 users” idea. 


Eric Schmidt actually worked with Butler when he was at Parc, and certainly picked up 
the idea while there. 


Eddie's reference (4) seems diffuse on this point, especially regarding when one 
should think about scaling ... 


Alan Kay 
<7 Still trying to learn how to think better - Updated 4y 


If Alan Kay made an introductory technical course syllabus to teach real 
computer science to undergraduates, what would be on it? 


| will add to and change this from time to time because there is not enough room ina 
Quora-sized answer to provide a good enough answer, nor enough time to really 
think things through as needed, but ... 


What if the intent here is “as a required course for all undergrads” and the question 
said “real music” instead “of real computer science"? 


In both cases there would be a wide spread of knowledge and skills in the student 
body, from “no previous contact” to “been involved professionally”. | think one 
difference in the music case is that some of the very involved students would be 
“classically trained”, and that would almost never be the situation for incoming 
computerists. 


And in any case, most of the students will represent more of a pop culture spread of 
knowledge. Because there is a lot of playing and other kinds of doing — composing 
— in music to go along with the theory, | think that it would be a must to have a 
number of introductory courses — some several years long — to give the less 
prepared students a chance to build up basic knowledge and skills as “practical doing" 
within a framework of “real music’. 


There's also a happy medium to content with prior experience: it is very often the case 
that “pop culture” skills that are deeply embedded but not comprehensive can be a 
big barrier to really learning a “real X", especially one that involves various kinds of 
“literacy”, "reading", and “writing”, and years of work. So starting pretty much from the 
word Go can be a real boon if there is enough time to develop the deep ideas in the 


best ways. 


Just to pick one more of the many issues that need to be considered is that an 
undergraduate education is supposed to be much much more than training for a job; 
the main purpose of college is to take students who think they are trying to get from 
A to B and to help them acquire enough perspectives and knowledge to see the 
previously invisible Cs and Ds as possible alternate and often better goals for them. 


In other words, the purpose is to aid qualitative changes, not just improvements. So 
important parts of this curriculum should include “ideas from civilization in general” (| 
will put this important area to the side for now). 


This brings up the woeful fact that so many students enter college without much of 
what high school was supposed to help them learn — and this forces much too much 
of the time spent in the first years of college as “remediation” rather than “doing 
college”. 


Finally, the actual process in most colleges is also anti-learning in many ways, 
especially the lecture-centered, subject/department-centered, grade-oriented 
processes. (This is one of the reasons | bring up “music” to help think about this 
question — at least in a good music college, it is not lost that a lot of the time has to 
be spent actually doing music — and getting better at doing music — not just hearing 
opinions about it.) 


As for content in computing — which is very wide ranging these days — | would 
concentrate early on two main things — systems and meta and how they intertwine— 
and set up what is needed to get the students to a real first stage of fluency in thinking 
and doing with these ideas. 


Both of these have easy starts to them, and both go really deeply into many of the 
centrally important topics, issues, and difficulties in the current states of the art. For 
example: simple-meta can be approached by designing and adding new language 
features in the spirit of the language; deeper-meta can be about changing some of 
the underlying primitives in important ways. Simple-systems can be about design and 
making computations, especially where scaling is involved; deeper-systems can start 
to look at how communications can be enhanced with “alien systems" via negotiation 
and other techniques that require various kinds of learning. 


| would most certainly — with colleagues — invent and make a special programming 
language and environment to provide a smooth and simple enough vehicle for 
getting fluent in systems and meta: “Simple things should be simple, complex things 
should be possible”. 


I'd take a different path from SICP, and use a different vehicle, but retain the idea of 


“safe and expressive meta” as a good route to represent meanings in ways that allow 
more profound meanings and processes to be created. 


The important idea here is parallel to: “Physics is not Math, but you really need Math 
to really understand Physics”. 


Similarly: “Computer Science is not Programming, but you need make systems via 
Programming in order to really understand Computer Science”. 


The wide ranges of: learners, kinds of help needed, and motivations (including 
negative) that students bring to a required course have to be carefully heeded. 


For example, it is very worth perusing the considerable lore that has now been 
gathered about teaching computing to wider ranges of students (for example from 
Mark Guzdial at Georgia Tech, now at U Michigan). 


One indication is that entering undergrad students at an engineering school like 
GaTech are often very ill prepared along many dimensions, and many will have 
considerable difficulty grasping accurate enough understandings of mechanisms in 
programming languages in the entry-level courses. 


In the old days, one might ask “how did they get into an engineering college without 
decent prep?” But today, colleges are much more like businesses, and for a variety of 
essentially monetary reasons will generally try to (a) admit to a level of bursting 
capacity, and (b) will try to “retain” the headcount rather than try to sort. 


Because of the generally poor high school educations available, this is not necessarily 
a bad thing, but it has greatly changed the relationship of students to the thresholds 
of quality needed in many subjects. 


In any case, this means that what I'd like to do in an introductory course for the 
students would really have to have a wonderful development environment to help 
them build good working models in their minds. 


I think of help using a rule of thumb: some students don't need much help, some will 
need help of “Type A", some “Type B", some “Type C”, and some you won't be able to 
figure out how to help. The environment needs to do really well with A, B, and C. 


This has many useful parallels with music learning. Some students will have very good 
pitch senses, others great rhythm senses, others great muscle and other memories, 
others a “feel” for harmonic theories, etc. It is generally the case in music that the 
budding musician will not have great skills in every part of music, and will wind up 
doing a lot of practicing developing what's needed. So: you really can get much much 
better at hearing what is going on via “ear training", at “tracking others to be ‘ina 
groove’ with them rhythmically”, to “know where you are and where you are going” 
while improvising, certain kinds of practicing will help the various musical memories to 
get deeper and faster, etc. 


In Alan Kay's answer to How can one become a better thinker? | mention a little about 
learning and thinking better being partly the creation of “brainlets” to offload as much 
of the work as possible. The students need to understand that this is part of the 
process for really getting fluent in anything, and to understand what is needed to help 
the "brainlets” form and grow. 


Dealing with the centrally important “one size does not fit all” problem is one of the 
most important parts of any course design. For one thing it argues against lectures- 
as-content — even if they were otherwise a good idea (they really aren't!). 


I think an “inspirational talk” every so often can be a great thing, but we shouldn't 
confuse what is good about oral cultures — the emotions, romance, theater, and 
sense of belonging — with literate cultures — the much-larger better-organized 
multi-perspective self-paced, etc., Better Organization Of Knowledge. 


Today we have the possibilities of: a DYNAmic Better Organization Of Knowledge, and 
we need to make this, and use it to the fullest. 


Here's another analogy to music learning. Generally, one has a private lesson with a 
teacher once a week (maybe twice, though that is pushing it). Most of one’s time in a 
week is spent practicing and learning. Almost without exception there are ongoing 
group experiences: orchestra, chamber music (including “piano band" if one is a 
keyboard player), chorus, etc. And there will be some courses e.g. in theory, 
composition, repertoire, etc. 


We can see that the balance between a bit of private tutoring and considerable work 
on one’s own is very different from standard US practice, but is like the “tutor” system 
that has been used at a number of UK universities such as Oxford and Cambridge. The 
tutor acts as a guide and quality control, whereas the student has to do most of the 
digging and practicing and other forms of doing and learning (this is a good thing!). 


As for the group experiences, this used to be easier and more common in computing’s 
“deep past”. Major universities would get — or even better make a computer — and 
the students would form part of the larger design and building experiences for both 
HW and SW. 


Even if a computer was purchased, the operating system and languages on it were 
usually pretty bad, and there was a lot of motivation to do a better, more useful set of 
tools. 


(Today it is still true that the OSs and languages are not very good on purchased 
computers, but the perspective to see this has been pretty much lost, and the will — 
and even expertise — to design and make a better OS or language (or better 
subsystem) in university has been lost along with it). This has removed from students 
of today much of the “real” from “real computing”, and quite a bit of the fun. 


I think for a course like this, as a stand-in for the “new computer" in the old days, I'd 
pick something (a) that is like a whole computer, that is (b) going to be used every 
day, (c) is badly conceived, (d) I'd help them see that “normal” does not necessarily 
mean "good" or “acceptable”, and (e) have them learn by doing the designs and 
building something much better. 


How about the web browser as a target? (Yikes!). This could be doubly interesting 
because it runs on top of the Internet — perhaps the best large system ever done in 
computing. 


This would be a bit tricky to set up, but quite interesting to see how this could be 
approached as a large group target. One would start by looking at what TCP does, and 
what it could do, and what the scaling of users — especially authors — implies about 
what is needed to deal with the major issues for both “reading” and “writing” and 
“extending” the new medium across billions of nodes and trillions of objects. 


(It's worth pointing out that quite enough was known about systems design in the 70s 
to make very a few good operating systems, programming languages, and the 
Internet itself. Unfortunately, essentially none of this knowledge is apparent in the 
designs of the Web Browser, and precious little for the Web itself.) The good news is 
that an amazing amount of this can be cleaned up and simplified by much better 
designs that run directly on IP (or on an extension of TCP/IP). 


This makes a lot of sense to me because today “the computer" is “the Internet of 
computers” both connecting physical hardware together outside of the machines, and 
also connecting the virtual software together inside the machines, and all as one 
network idea. 


<Hopefully not much more to come here> 


I'm guessing that it would take a minimum of a year — more like two — to do the 
design and prep for this course. For one thing, it would take a while to design and 
make the programming language and environment that would be used, and it will 
take more time (and real effort) to have the environment also be able to supply the 
useful “helps” that many of the students will need. 


Many parts of the curriculum will need to be tested, so some of the design and testing 
can be done as projects in other courses, and with the equivalent of the paid students 
who are used as test subjects in psychology experiments. 


Our experience in doing past curricula in schools is that it takes about 3 years to get 
even one that is adapted from already tested materials to jell and work. Part of this is 
to give the teachers time and experience to get comfortable with both the subject 
matter and the rhythms induced between this and the spread of students who will 
need to helped in various ways. Since — here — we are planning to do the teaching 
ourselves, we can probably cut off a year from the prep time. So let's say “really two 
years”, and “maybe we could start with examples in one year of prep”. 


So, here, | can only jot down some notes — even just “impulses” and “instincts” — that 
are occurring to me as | work through the ramifications of the question (these often 
don't pan out, but | like to write them down as they occur). 


The first big impulse is “start with something massively parallel in the spirit of 
StarLogo, NetLogo”, and “Kedama” (in Etoys), maybe with tinges of Elixer/Erlang 
thrown in. 


Because “cognitive load" is a main factor with introductions to new things, we want to 
somehow both (a) have the manifestation of the design be fun and really interesting 
— so it will need to involve interactions between the elements — and (b) we would 
like to have this experience use few enough elements to avoid as much as possible 
“induced blindness from complexities”. 


Examples like these will drive both the curriculum design and the design of the 
language and environment to try out ideas by using them, making them, etc. 


The second impulse was “vacuum cleaner!” (| have a feeling that this will wind up 
further down the list, and that something more like “epidemic!” might be first. Still, 
what is actually going on with a vacuum cleaner is wonderfully opposite to what most 
people suppose, and the modeling of it is simple and easy to understand.) 


A mapping to the “outside world” is eventually important, but a main aim would be to 
have the pedagogical system not exhibit properties that make it difficult to think 
about and use certain important good ideas (this is a real problem in most language 
systems in use today, whether academic or commercial — i.e. they have given up too 
much meta for the sake of other considerations, some merely historical and/or 
“religious” baggage). 


So: one of the main purposes of any early course in anything is to help the students 
gain enough knowledge and perspective to start to be able to criticize the current 
state of knowledge in the field that they are being required to learn. 


€ Alan Kay 


Still trying to learn how to think better - 4y 


Was the Xerox Alto a prototype or a finished product? 


It was neither. It was a “time machine” to allow individual researchers starting in 1973 
to work about 12-15 years in the future they were trying to invent rather than being 
forced to use and work around the hardware and software of the early 70s. 


Another way to look at this is: if you are trying to invent something new and different 
that will take a while to do — and in our case it was all the new software and user 
interface design that needed to be invented and built — and there is an exponential 
that looks “reliable” (as Moore's Law did back then) — then you will be hopelessly 
behind if you don't anticipate the changes that will be rendered by the exponential 
during the years of development. 


A further perspective on this is that since “a computer is a computer is a computer", 
you can reliably buy your way into the future by making a supercomputer in the 
present, knowing that it will be at commodity costs in 10-15 years. 


For more about the structure of the Alto, see To what extent did the Xerox PARC Alto 
inspire the Apple Macintosh? 


The two big benefits of having a “supercomputer per person” were (a) that many 
experiments could be easily done without optimizations because of the speed 
available, and (b) if you did optimize then you could make the kind of software that 
would run on commodity machines (also optimized) 10 years in the future. “(a)” 
produced screen painting primitives, the GUI, and Smalltalk among many other new 
things, “(b)" produced what would be Microsoft Word, and other PC apps. 


This way of looking at things was quite standard in the ARPA research community, and 
those of us who went to Xerox Parc took this spirit with us. 


Because the future — as we saw it — would have one computer per person all 
networked together with resources, the “supercomputer per person” had to be 
engineered far beyond a demo, and had to be easily replicatable. (At Parc we always 
tried to engineer for ~ 100 users, so we had to be able to make 100 Altos, the 
Ethernet had to be able to handle 100 Altos, the laser printer had to be able to handle 
100 users, the software had to be reliable enough and documented well enough to 
support 100 users, etc.) 


Most of the design and engineering magic on the Alto and subsequent HW was 
primarily the work of Chuck Thacker. 


The Alto was quite successful in meeting its design goals, so eventually almost 2000 
were made: enough to spread around Xerox, and then to many of the ARPA 
universities. 


Meanwhile, Parc kept on inventing many more newer machines to stay in the future. 


The viability of an Alto type machine as a product really depended on the VLSI 
technology for main memory. The first year that an Alto-type machine could 
conceivably have been made a product was ca 1976-7. It would have then been 
salable for about what a stand-alone “word-processing machine” would be — so not 
cheap, but with lots more functionality and flexibility. 


€ Alan Kay 
<< Still trying to learn how to think better - 4y 


How do you understand that you don't understand the understanding 
demanded by a problem? 


| always assume | don't have the necessary understanding to start on anything (this is 
a good heuristic for we humans to insert before thought and actions of any kind). 


One of the consequences of this is that because many “problems” are quite often not 
actually “the problems", this provides better starts to “finding more appropriate 
problems", and to finding better ways to deal with them. 


Alan Kay 
Still trying to learn how to think better - Updated 4y 


What was computing like in the 1960's? 


I started computing ca 1961, and my job in the Air Force starting in 1962 was as a 
programmer for Air Training Command at Randolph AFB in San Antonio. | entered 
grad school at the University of Utah ARPA project in 1966. 


With all due respect to Andrew Silverman's answer: although there weren't millions of 
computers back then, there was more demand than supply for programmers — so 
anyone with interest and aptitude could definitely get access (you had to go to the 
computers). 


Because of the Cold War, a lot of DoD money was being spent on the development of 
computing, pretty much right after WWII, and especially in the 1950s with the 
development of the SAGE early warning airplane tracking system. This had started at 
MIT with Whirlwind ca 1950, and many important early technologies happened on this 
machine, including: displays, pointing devices, interactive dynamic language, core 
memory, etc. A lot of engineering was done on these in the SAGE development, and 
this helped raise all boats. 


The early 60s also saw a deeper effort to replace punched card accounting machines 
(PCAM) with computers (this led to more than 14,000 1401 computers being built — 
and to the way IBM thought about the eventual System 360 in 1964). 


JCR Licklider started the ARPA Information Processing Techniques Office (IPTO) in 
1962, and this agency plus ONR and the Air Force did the bulk of funding for visionary 
long range projects in computing. IPTO alone funded about 16 large research 
community efforts spread around the country. 


The first fruits of ARPA funding can be seen by getting — it's online — the Sept 1966 
special issue on “Information” by Scientific American. This special issue was completely 
written by the ARPA community, including John McCarthy, Marvin Minsky, Ivan 
Sutherland, Dave Evans, etc. | cherish this because | first met this community in Nov 
1966 as a grad student, and this was a great summary of what they had accomplished 
in the first 4 years. 


Today we have a paradox. We have enormous amounts of inexpensive computing 
power but very few ideas out and about: commercialism has produced a boring low- 
level orthodoxy and created such a monumental stultifying “normal” that | think it is 
much harder for youngsters to think about interesting things while being suffocated in 
“really nothing much at all (but no oxygen either!)".+ 


In the 60s, there was also a boring low-level orthodoxy — especially IBM — but 
because of ARPA et al. great ideas not only abounded and were being worked on, but 
many of the ideas were superior — in absolute, not relative, terms — to most ideas in 
view today. tt 


When Congress in 1970 made a huge error and used a too broad brush to recast 
ARPA, ONR, etc., they threw the baby out with the bathwater and left most of the 
important computing research of the country to the whims of businesses — most with 
IBM's fatal disease — and government agencies — like NSF and the new DARPA — 
that were much too much under the thumb of Congress. 


In sum, computing in the 60s had a large super exciting part to it — it was a Grand 
Romance — that had a big payoff in the 1970s. | had fun as a programmer in the early 
60s, and the best time in my life from the middle 60s to the end of the 70s. 


+ There are a very few exceptions today to keep us from being totally gloomy, for 
example: Bret Victor's "DynamicLand" @ is a breath of fresh air full of oxygen and 


fruitful ideas for a much more exciting and humane future. 


tt Here are a few personal favorite highlights of advances in computing in the 1960s 
(many more will have to go unmentioned). Many, but not all, funded by ARPA-IPTO. 


1960 


¢ JCR Licklider “Man-Computer Symbiosis” which started the vision that led to ARPA- 
IPTO funding and much of the personal computing and networking technology of 
today. 


* Algol 60, BNF, and Ned Irons’ “Syntax directed compiling”. 
1961 


* The Burroughs B5000 (main designer Bob Barton): a two CPU “higher-level” 
computer with tag bits on every word, unforgeable references, protective memory 
management and swapping, parallel processing, and a hardware implementation of a 
“byte-coded” execution engine tailored for Algol-like languages. A “Simula-machine" 
several years before Simula, and the first system to implement an operating system 
completely written in a higher-level language. Can't be too highly praised. 


* CTSS/MIT one of the first real interactive time-sharing systems as a “computing 
utility” for a whole community. 


« Paul Baran at RAND presents first briefing on the possibilities of packet-switching 
networks 


1962 


* Lisp 1.5, John McCarthy, et al. The very important follow-on from the invention of 
Lisp (can't be too highly praised) in the late 50s. 


¢ JCR Licklider starts ARPA-IPTO with initial funding to MIT for “Project MAC". Can't be 
too highly praised. 


* “Sketchpad” by lvan Sutherland at Lincoln Labs. Not just the invention of interactive 
computer graphics as we know it today, but with “masters and instances” initiated 
“object-oriented” representations, and used “constraint solvers” as a very high level 
programming language. Can't be too highly praised. 


* Wes Clark's LINC at Lincoln Labs — arguably the first personal computer that 
included all the standard features 


1963 


« John McCarthy — Situations, Actions, and Causal Laws — a landmark follow up to 
the “Advice Taker” paper “Programs With Common Sense”. | believe this is the first 
introduction of the idea of “labeled states” and “situations” as gatherings of “labeled 
states”. Can't be too highly praised. 


* JOSS at RAND, Cliff Shaw. The first truly well designed interactive programming 
system for end-users 


* Thomas Evans, Geometric Analogy Solver — my vote for the top thesis from the 
incredible first round of “real Al” theses from Marvin Minsky’s students. 


1964 
* SRI: Engelbart and English invent “mouse” pointing device 
* RAND: Tom Ellis et al, invent the RAND Tablet, a truly great pointing device 


* Simula | converted Algol to the language that served as the model for C++ in the 
early 80s. A great combination of insights (implemented in 1965, and published in the 
US in 1966). 


* Meta II by Val Shorre was a wonderful tiny and powerful self-bootstrapping way to 
create programming languages, including itself. 


1965 
«Ivan Sutherland's IFIPS paper inventing virtual reality and augmented reality. 


* Butler Lampson’s first designs and writings about systems design, protection, and 
inter-process communication fomented by the "Project Genie” time-sharing system 
(which became the SDS-940, and the computer used by Engelbart for “the mother of 
all demos” in 1968). Butler can’t be too highly praised. 


Niklaus Wirth — Euler — was a compact detailed runnable example and model of 
making a generalization of Algol (kind of Algol meets Lisp as motivated by van 
Wijngaarten). A great set of ideas to learn from, especially if his syntax apparatus was 
replaced by e.g. Meta-ll-in-Euler. Used a small interpretive virtual byte-coded machine 
modeled after the earlier B5000 hardware. 


1966 


* Gab Groner's great gesture and hand drawn character recognition system at RAND, 
using the RAND tablet. 


1967 


* Dave Evans, Gordon Romney, et al, with breakthrough by John Warnock, University 
of Utah ARPA Project — Invention of modern continuous tone 3D graphics 


1968 


* Doug Engelbart, Bill English, Jeff Rulifson et al — “the mother of all demos” showing 
the public an elevated rich example of many facets of “personal computing”, idea 
linking, collaborative simultaneous creating, etc., from a cosmic vision that has largely 
been lost today. The overall vision and effort can’t be too highly praised. 


«Ivan Sutherland, Bob Sproull, Chuck Seitz et al. — VR and AV for the first time 


* Tom Ellis, et al. — The GRAIL System — the first completely pen-based system, 
recognizing drawings and text to make hyperlinked simulations, etc. A landmark in 
“intimate computing” and user interface design. 


* Ned Irons — IMP — the first really workable and convincing most easily fully- 
extensible language 


1969 


« Pat Winston — Learning Structural Descriptions From Examples. This thesis was 
published in 1970, but the work was done in the late 60s. A truly wonderful piece of 
work that creates a generalized theory of analogy and representation that was 
motivated by the previous work of Tom Evans, but breaks new ground and extends 
the range of ideas into programming languages. 


* Carl Hewitt — PLANNER — both the spiritual and an actual parent of Prolog and 
many other inference systems to come that were also programming languages. 


* Dave Fisher — A Control Definition Language — also published in 1970 but done in 
the late 60s. This was a powerful comprehensive look at extensible languages — 
especially with regard to making mechanisms to "manipulate time” using Lisp-like 
kernel metaprogramming. 


« Nicholas Negroponte — The Architecture Machine (MIT Press) — published as a 
book in 1970, but work done in the 60s. By far the best account of the longer future 
that involved computers. Partly because Nicholas trained as an architect, and partly 
from his larger take on what was around him, this was the deepest synthesis and look- 
ahead at profound future possibilities. Can't be too highly praised: just multiple 
meanings of the dedication alone can be pondered fruitfully for a long time "To the 
first machine that can appreciate the gesture”. 


* ARPA Community — design and building of ARPAnet and first transmissions. Prime 
movers were Bob Taylor, Larry Roberts, Wes Clark, Frank Hart, et al. 


<more to come> 


€A Alan Kay 
Still trying to learn how to think better - Updated 4y 


How do! learn to think better? 

Originally Answered: How do | learn to think better ? 

To me, the first thing to notice about human thinking — via scrutinizing the rare 
examples of "first class thinking” we can identify — is that we humans are pretty much 
not genetically equipped for thinking at all! 


We are most well set up for learning how to fit into our cultures, and make our way in 
them, mostly socially. We are a little bit clever, and like other primates are able to 
cheat in numerous ways, but we very often aren't smart enough to also take into our 
minds the consequences of cheating. 


We have “lots of ‘coping genes’ but essentially no ‘progress genes’ " — in fact, it 
appears that even the idea of progress had to be invented (mostly in the 18th century). 
The philosopher and mathematician Alfred North Whitehead remarked that “The great 
invention of the 19th century was Invention itself” (suddenly everyone was inventing!) 


Most of the processes we associate with “good thinking” seem to be inventions, the 
good ones very rare and initially far apart. Despite the power of these, our thinking 
was so dim that it took quite a while to realize that one of the things we should be 
doing is to purposely invent better ways to think, and then figure out how to teach 
them to children to create much more able adults than we are. (We still do not see 
much call for this in most public media.) 


Einstein came up with a great comment: “We cannot solve our problems with the same 
levels of thinking that we used to create them”. 


This links up starkly with the Dunning-Kruger Effect %: that many people are not able 
to think well enough to see that they are not thinking well enough. There is a very real 
sense in which all of us have this problem. 


A parallel allied concept is sanity, which is usually assessed in a relative normalized 
manner i.e. “what the majority of people in a culture do, and especially if in accord 
with the culture’s norms, is considered sane, and that outside these norms is 
considered not-sane." But if we look at sanity as “the goodness of the mapping 
between what's in a mind and what is actually in the environment in which the mind 
exists”, then we can see that all human beings have “delusional disorders", and many 
of the most dangerous ones are held by much more than a majority of humans! This 
includes the Dunning-Kruger delusion that they are “generally thinking well”. 


Another part of considering what good thinking might be about is to notice that 
ignorance very often resembles stupidity. Imagine being born with twice Leonardo's 
IQ but in 10,000 BCE! 


And Leonardo, supersmart as he was, was not smart enough to invent any useful 
engines for any of his fantasy vehicles. He was in the wrong century — there was not 
enough knowledge for him to use and reshape with his intellect. 


Similarly, it took geniuses to invent calculus but many much less smart people can 
learn it and become more powerful thinkers about many kinds of change than the 
geniuses of antiquity. 


We can then reflect what someone nowhere near Leonardo or Newton — Henry Ford 
— was able to do. Why? Because of the vast change in context — how to look at and 


think about the world around us — for which Newton was the main catalyst and 
cause. 


I think of this as “Knowledge is Silver, Context is Gold, IQ is often a Lead Weight!" 
Or: “Context is worth 80 IQ Points!" 


This is especially true if the Knowledge is (a) drawn from the strongest Contexts, (b) 
some of the Knowledge is the knowledge of Contexts (or Points of View, or 
Perspectives, etc.), and (c) some of the knowledge is what has been learned about how 
to think much better than our genetic minds can by themselves. 


Almost 400 years ago Francis Bacon wrote about the “four Idols” that humans worship 
which confuse our attempts to think. In modern vernacular, we have “bad 
brains/minds” from our genetics, our cultures, our languages, and our academics. He 
called for a “new science" to be invented that would mitigate these as much as 
possible (much of what he called “new science” is what we call “Science”). One part of 
this is the idea that we can find and invent “methods and tools” which when carefully 
learned and used can help us think much better than our traditional processes did. 


A classic study of human thinking problems — both in general and associated with 
language use — is "Science and Sanity” by Alfred Korzybski. 


The field of Cognitive Psychology — of George Miller, Jerome Bruner, etc. — started 
to measure many limits to human thinking, for example that we can only deal with a 
very small number of things at once, and for something new we are almost blind, deaf, 
and dumb because we don't yet have mental organizers for what we have to deal 
with. (This is called “Cognitive Load".) 


A more modern identification of human mental “noises, glitches, and barriers” is part 
of the work of Kahneman and Tversky — see Kahneman’s book “Thinking: Fast and 
Slow" — which includes how humans form valuations — the field of “Behavioral 
Economics” — and also unearthing the many “Cognitive Biases” we routinely exhibit. 
The Wikipedia article @ lists more than 100 that have been found so far — but readers 
will find it easy to identify and add more. For example, not included in the K&T list are 
(a) we confuse both our perceptions with “reality” and “normal” with “reality”, and (b) 
we often will generalize a good enough idea and then make dogma (and even 
religions) from it. 


So we can certainly make a start on learning how to think better by (a) identifying 
existing barriers of all kinds, and then to find and invent heuristic workarounds that 
will help eliminate noise and increase clarity, and (b) from the other direction, to 
identify the strongest known ways to think clearly — for example, the methods and 
tools of science — and learn them so fluently that they will be at least as automatic as 
our less able genetic reactions. 


Just a note here on this kind of learning. If you remember the stages you went 
through to learn to drive a car, the most striking were (a) the beginning stages that 
involved over-controlling, tunnel vision, not being able to hear the person trying to 
help, not being able to see stop signs and children, not knowing what gear one is in, 
etc. and (b) a few months later being able to steer the car, listen and talk to the other 
person, automatically be aware of stop signs and road conditions, etc. (This is a 
general learning progression for most things.) 


What is happening during the learning is that a lot of the initial effort had to be done 
with your cognitively smart, but slow part of your brain, and this is easily overloaded. 
The learning starts to build little specialists — | call them “brainlets” — that offload 
much of the work to faster, less smart, but more routinized parts of the brain. This 
leads to both skills, and a certain resistance to learning different ways to do things. 


Quite a bit of learning to think better is accomplished by doing the various processes 
that will build “brainlets” for helping to think. Some of them will damp down many of 
our genetic responses, and some of them will provide a variety of points of view, 
analogies, etc., and a whole host of heuristics to help. 


Seymour Papert used to say “You can't think about thinking without thinking about 
thinking about something". In other words, it will really help to have important issues 
and ideas, and things to learn about that require much better thinking. Then, the 
many things that are known about thinking, how poorly we are generally at it, and 
what we've learned about doing it better, can be brought to bear. 


€ Alan Kay 
‘Have designed a few programming languages - 4y 


What made APL programming so revolutionary? 


APL stands for “A Programming Language”, the title of the book in 1962 written by 
Ken Iverson about what initially was called “Iverson Notation”. Part of the reason for 
the “notation” label was that it was used extensively a number of years as “a notation” 
before it was implemented as "APL/360” (on the IBM 360 series of mainframes). 


Ken Iverson was essentially a mathematician, but who also had a physics background, 
and trained under Howard Aiken at Harvard in close proximity with the various 


computers designed and built there, receiving his PhD in Applied Math, with a thesis 
on how to deal with very large sparse matrices. 


He started to use mathematical tools to describe computations and computers, and 
soon found these to be lacking. This led to a number of inventions very much in the 
spirit of mathematics that allowed many more structures and operations to be easily 
defined and “notated”, many by “functional projection”. 


One of the most interesting things about “Iverson Notation” at this stage was that not 
having an implementation greatly helped — IMO — what he tried to do at the 
descriptive level: there were no worries about whether this or that could be 
implemented at the time, or whether there would be enough computing capacity for 
speed or space to eventually implement the notation. 


It was in this form that | and many of the other grad students of the mid-60s learned 
“lverson". My first CS course was from the legendary and wonderful Nick Metropolis, 
the main architect and builder of the Los Alamos computers, especially the “Maniac” 
series. Nick liked “Iverson”, and used it extensively for both hardware and software 
descriptions. A year or so later, Bob Barton in his notorious first course in “Systems 
Design", required us to “get and read and learn Iverson's book". 


To motivate what Ken Iverson decided to do, it is worth looking at the history of 
Maxwell's Equations — 4 ideas (can be just 2 or 1) that will fit on a T-shirt. However, 
one of Maxwell's main renditions was not in the form we are familiar with, but was 
expressed as 20 partial differential equations in x, y, z coordinates. 


df dg dh 
. 2 _ 0 {1} Gauss’ Law 
a & 
pitas = 
dy dz 
an ar du (2) Equivalent to Gauss’ Law 
d dr for magnetism 
a FF 
uy = = 
abt a 
Pp. al y a p= | aF A 
’ at dt} at dy 
{ dk dy\ dG dad? Faraday's law 
Qefa * ¥ * °° (3) (with the Lorentz Force 
. and Poisson's Law) 
{ dr dy\| dH d¥ 
Rew p a 
. @ dt}; dt «& 
r 1 
dy dp : 
Fa 4s dy 
dy dz ” p= Pp - 
da dy F di P 
-—=4 leaouk 4) Ampére-Maxwell Law 
ad dy a avq dit (4) ” 
dg da el Pai dh 
de dy dr 
r 4 
Pe-p Qe-y Re-& Ohen’s Law 
The electric elasticity 
P- =k Reth 
Y Q-& equation (E = D/c) 
. - - 4 
de hj ar 
- = + 2 —" 9 Continuity of charge 


This is not a great T-shirt! 


Helmholtz and especially Oliver Heaviside did a fair amount of work to use the 
definitional possibilities of mathematics to hide coordinate systems with vectors and 
details of the PDEs, with “operators” (Div, Curl, Gradient ... “and all that") 2. 


V Dep (1) Gouss’ Low 

V-B=0 2) Gauss’ Law for mageaticm 
oB . 
\ FE —— (3) Faraday's Law 

t f 
oD 
VxH=—+J (4) Ampére-Maxwell Law 
Vv 


ee | 


A terrific T-shirt! 


You can think of the operators "gradient": V, “divergence”: Ve, “curl”: Vx, as “meta”, 
that act a bit like macros to rewrite functions in a more complex “decorated form”. 


The basic idea here is to get “whole ideas into one eyeful” by inventing notations and 
processes that can do this, and consequently requiring readers to learn the new 
notations fluently enough so there is a net benefit rather than just more noise. 


When this is done well, the new “meta-stuff" becomes generally useful (like the grad, 
div, curl “and all that” above). An example in APL is the operator ”.”, which is 
generalized inner product that can take any APL functions as arguments. For example, 


what we think of as “matrix multiplication” is +.* in APL (see inner product in APL) %. 


People who learn math are quite willing to do this learning and gain the necessary 

fluency — but there's considerable evidence that most computer folks are not at all 
willing to do lots of training in special tools that would make a difference in “being 

professional”.+ 


This has led to the idea that APL is not readable. In fact, it is both very readable and 
very compact. This is not to say that a face lift wouldn't help — the standard notation 
for APL was derived to fit on an IBM Selectric golf ball typewriter terminal, and could 
be greatly improved today. 


The second interesting idea in APL is “projection”. This is much more relatable today in 
an era of "map/reduce" than it was in the 60s or 70s, even though one could also write 
a good “mapping” function in Lisp (and it was also an “operator” because it could take 
a function as one of its arguments). In the early 70s, Unix happened, and Doug Mcllroy 
invented "pipes programming” to allow in this case “data” to be passed through 
“functions” to be reformulated, 


However, the big uses and extreme ranges of this way to program was explored 
earliest and most extensively in “Iverson Notation” and to a slightly less extent in the 
actual language "APL". 


Attaining fluency in APL as one of three or so main ways to think about programming 
“is good for one’s mind”. As in the later map/reduce, one "sends" a structure in 
parallel through a cascade of shaping functions and then a cascade of trimming and 
extracting functions to finally get a result. (One must suppress one’s imagination of 
just how big some of the intermediate structures might be getting ... this is also good 
for one’s mind!) 


There is real clarity to be gained for both writers and readers of APL. 


A number of us in our research group at Parc liked APL quite a bit, and it was clear 
that much more could be done using polymorphic operations and the extension 
features of Smalltalk (only a few of these experiments emerged publically in the 80s). 
But, imagine gazillions of objects provided with “events, suggestions and hints", etc. 


As always, time has moved on (and programming language ideas move much slower 
— and programmers move almost not at all). 


There are several modern APL-like languages today — such as J and K — but | would 
criticize them as being too much like the classic APL. It is possible to extract what is 
really great from APL and use it in new language designs without being so tied to the 
past. This would be a great project for some grad students of today: what does the 
APL-perspective mean today, and what kind of great programming language could be 
inspired by it? ++ 


+ This seems rather like the disinclination of so many pop culture musicians to learn to 
read and write music notation, despite the tremendous advantages for doing so in 
many areas — and in fact what seems to be a disinclination in much of our culture for 
learning to fluently read and write the written form of their own language. It's not that 
you can't do art in “oral modes”, but that the possibilities for art are so expanded 
when literacy is added. 


tt As an example, a looser more versatile version of this kind of programming can be 
done using dataflow between processes that themselves are made from projective 
mappings, and this could yield a very useful and beautiful language. This is what Dan 
Amelang and some of his colleagues did to make the Nile Language, which was 
especially aimed at “graphical mathematics and rendering”. In the STEPS project of 
some years ago, this allowed virtually all of 2.5D “personal computer” graphics — 
including rendering, compositing, filtering, curves, fills, masks, etc., to be defined and 
run in real-time in under 500 lines of code. This replaced an estimated 50,000 to 
100,000 lines of C++. Because of the dataflow and the independence of the mappings, 
this was able to be set up so it could use as many cores as available to run the code. 
(And so forth.) 


500 lines of code is only about 10 pages and it can be shown as an “eyeful” ona 
desktop screen: 


berier & wertae a 


odes r nee re 


(eterte 5 = 


— may 
terse 


This is partially low hanging fruit since mathematics does underlie computer graphics 
at all levels. The kinds of ideas that APL first brought to light allows “runnable 
mathematics” to be possible (and when it is possible, it is as wonderful as it gets!) 


€ a Alan Kay 


Have designed a few programming languages - Updated 2y 


What does Alan Kay think about inheritance in object-oriented 
programming? 

Simula | didn't have inheritance (paper ca 1966) and Simula 67 did (paper ca 1968 or 
so). | initially liked the idea — it could be useful — but soon realized that something 
that would be “mathematically binding” was really needed because the mechanism 
itself let too many semantically different things to be “done” (aka “kluged") by the 
programmer. For example, there is no restriction of any kind to have a subclass 
resemble a superclass, be a refinement of a superclass, etc. All relies on the cleanliness 
of mind of programmers (and even the most clean of these often just do things they 
need when in the throes of debugging). 


So when | designed Smalltalk-72 — and it was a lark for fun while thinking about 
Smalltalk-71 — | thought it would be fun to use its Lisp-like dynamics to do 
experiments with “differential programming" (meaning: various ways to accomplish 
“this is like that except”). 


Also, Simula like Classes were perhaps not the best abstract “instance definer and 
producers”. They were very Platonic, and perhaps oversimple and overrestrictive. Plato 
has Socrates express “two principles” about defining /deas (in the Phaedrus): 


e First, the gathering of scattered particulars into one /dea so that everyone 
understands what is being talked about; 


¢ Second, the separation of the /dea into parts by dividing it at the joints as 
nature directs, not breaking any part in half as a bad carver might 


This is a pretty good early definition of modular design and perspectives on systems. 


Later, Aristotle advocates a looser approach: to gather examples of an /dea rather than 
to try to define it (the set of the examples is used as the primary definition). This has 
some real benefits: for example, a particular example can be part of the definition of 
more than one /dea. 


While pondering these ideas in the late 60s, | found that the “real Al” (I have to use 
this term today yikes!) people were doing the most interesting things in trying to 
represent meaning — their thinking was far beyond the programming language 
world. And they didn't really care how slowly things ran (which was really a great help 
in thinking in those days!). 


To pick one of my favorite examples, the work, and then the thesis (ca 1970), of Pat 
Winston at MIT — about “Learning Structural Descriptions From Examples &%" — was 
just really good. Pat used “semantic nets” for representations, but if you were thinking 
“objects” you could see that his approach was very strong. For example, he was the 
first (| think) to have full descriptions for “relators” — in today’s terms, this would be 
to have classes for polymorphic methods to (a) protect their generic meanings, and (b) 
to allow relations on them. So a relator like “to the left of” could be asked, “what is 
your inverse?” and it would reply “to the right of". 


| could see that this could be a wonderful thing for OOP programming, not just “OOP 
in Al". For example, it provided one way to deal with search and priority conflicts if you 
had multiple superclasses (and any reasonable definition scheme with classes would 
need to deal with something like multiple superclasses). 


Another interesting idea around this time was “relator inheritance” — that is, to have 
what an object means to actually be the collection of its parts as a system. Larry Tesler 


did some very interesting and compelling experiments in Smalltalk-72 using this idea. 


The “real Al" folks at Parc were also doing a number of really good things — mostly in 
Lisp — but Ira Goldstein and Danny Bobrow did the PIE system in Smalltalk, and IMO 
greatly extended and added good ways to think about “objects” (especially “multiple 
perspectives”, which was perhaps the earliest set of designs for what we might call 
“traits” today — this was done very well | thought). 


Because our goals at Parc were to invent as rich a version of “personal computing” 
(especially for children) as we were able — and not to invent OOP programming per 
se — we followed none of these, nor many other idea trails as they came up. But 
concentrated on user interface, simplicity of expression, “eternal execution", safe and 
comprehensive interactive debugging (even of the debugger), etc. 


This also came to a head when the plan for a new more powerful machine to replace 
the Alto in a timely fashion with something more powerful along several dimensions 
was not politically feasible with Xerox. This led to the various groups there making do 
with what we did have. Going larger and aster for us happened because of the brilliant 
design and implementation work of Dan Ingalls and several others in the group. This 
resulted in Smalltalk-76, which was almost 200 times faster than ST-72, and could 
manifest about a megabyte of objects. One fallback was to a single inheritance 
hierarchy, and most of the time we were able to control it well. 


Another way to look at all this from a different point of view than from that of a 
programming language, is to think about what the dynamic system of 
intercommunicating objects might look like. Here the richness/simplicity of the 
messages that can be sent, the integrity and transparency of the modules, etc., are 
some of the most important considerations. Then one can go back and ask questions 
about how these are best programmed in ways that will help the limitations of 
programmers. 


Endnotes: 


0. | should have emphasized above — and elsewhere — that the idea of *instancing* 
is crucial for this kind of systems design. The reason | left classes out of “criteria for 
object-oriented” is mentioned above a little. For example, prototyping — as 
advocated by Harvey Lieberman and done in an interesting way in Self — has its uses 
and charms. | spent a few years arguing for “exemplars” that would provide more of a 
picture of what instances would look like (| was upset that the “document class” in 
Smalltalk was extremely bare — everything that was interesting about a document 
was added on the fly when a document was being made — it was hard to see the 
possibilities in the class description.) 


| wrote a paper in the late 70s that looked at all of these mechanisms, and many 
others, as examples of *views*. That might have been overkill, but semantically, it 
made quite a bit of sense. 


1. | felt somewhat the same way about inheritance as | did about types, in that both 
needed to be a lot better than they were in order to pay for the overheads and pitfalls 
of using them. Both of these are basically “control of meaning/semantics” problems. 
There have been efforts — e.g. in MIT's “Flavors” and Eiffel — to pin down some of the 
degrees of freedom to get more meaningful expression. 


2. Speaking of types, another interesting idea for Smalltalk-72 that never made it 
strongly into the language came from the idea of “gauges”, an entity that acted like a 
number in a variable but constrained its values. The idea was to put an instance of this 
into a variable and then never rebind the variable. The instance would contain the 
magnitude, do arithmetic, check value ranges, etc. It could rebind, but only within the 
constraints of the particular gauge. Etc. 


This was possible in ST-72 because "<-" was just a token, and one had to do the 
equivalent of Lisp’s SETQ to rebind a variable (this would look like 


‘a <- foo 


in ST-72). So anytime you wanted “semantic types” you would install a gauge kind of 
entity that would do checking that went beyond simple class checks — for example, 
one could imagine a type for a function called “sine” that enforced the body to only 
do a close enough approximation to a sine function. Typing starts to get useful when 
it becomes more and more semantic. 


3. Olof Salberger in a comment below points out that there has been interesting 
progress in dependent types, especially in LiquidHaskell @ and F-Star &. This is all to 
the good. However, the add-ons start to diminish the readability of the code (one 
good reason for very high level languages is that the code can be both smaller, more 
expressive, and — with great care in design — more readable and gistable. 


Since it is relatively easy to make an experimental language, | would suggest an 
approach to an eventual “really nice” statically typed language by first making a 
dynamic language that can use objects as “semantic type definers and guards" to be 


installed in variables — as explained in Note 2. above — and to make a number of 
sample systems with it to find what level of static type-safety really pays for itself, and 
to then do co-evolution passes to get the dynamic types in good enough semantic 
shape so that all (or most) of what they do dynamically can be done at code writing 
time. 


The basic point of this comment is that in the end one wants to manifest a running 
system with as few errors of every kind as possible — and *no* fatal errors — some of 
which will have to be detected at run-time. A wonderfully type-safe scheme that 
resists writing (say) an OS or similarly complex systems is not really all that interesting 
and worthwhile in the end. Similarly, a scheme that allows all what is desired to be 
written, but only in a horrendous hodge-podge of code is also not interesting and 
worthwhile in the end. 


| 


We want “clean reality 


€ Alan Kay 
T- Still trying to learn how to think better - Updated 4y 


Why did Alan Kay choose rectangular shaped Windowing as the initial style 
of GUI? What were the considerations and impracticality of differently 
shaped windowing approach? 


The 4 answers already given are pretty good. 


First, one of the earliest — if not the earliest — example of a windowed display with 
general clipping was that of Ivan Sutherland's Sketchpad (ca 1962) which windowed a 
very large virtual display surface that could be continuously panned and zoomed in 
real-time controlled by some of the many sensor knobs on the machine below the 
display. This immediately brought up the problem of clipping lines that were not 
completely in the window — and arcs too because lvan also used circular arcs as 
primitive elements. 


The dynamic bridge that Ivan Sutherland is building in 1962 is on a larger virtual 
display than the physical display, and is therefore shown clipped 


On the heels of Sketchpad was “Sketchpad III" which used much of the Sketchpad code 
and added 3D and multiple views of the object to provide the normal 4 views used in 
drafting (3D, plan, elevation, etc.). 


In the mid-60s Ivan decided to invent “virtual reality” using a Perkin-Elmer heads-up- 
display originally made for helicopter pilots. This required the “virtual world” to be 
thought through — which resulted in the general notion of a kind of telescope or 
camera, one end of which — the window — was in the virtual world, and the other 
was on the display device — the viewport. (Note that that this terminology didn't last 
because it was difficult not to think of the thing on the display as “a window”.) 


ft 


a 


How to do fast clipping on calligraphic “line-drawing” displays was worried about, 
because the displays after Sketchpad were told to draw a whole line (the Sketchpad 
display on TX-2 just drew points). What is called “the mid-point” algorithm was 
independently invented by Danny Cohen (with Ivan) and Ed Cheadle (with whom | was 
working on the Flex Machine, which also had multiple windowing). 


So multiple clipping rectangular windows were around in the 60s before Parc. 


The 3D work — mostly at Utah, but in a few other places — had shifted to trying to 
generate “continuous tone” images rather than wire frames, and this resulted in 
searches for efficient ways to determine what color each pixel should have. 


This was not done in real-time back then for a variety of reasons. 


When | thought about the Dynabook in 1968, | posited a single bit per pixel using an 
LCD screen (like those starting to be invented by George Heilmier at RCA and Peter 
Brody at Westinghouse). 


| could calculate that one needed about 1 million pixels for being able to do most 
things somewhat reasonably. This was about 128K bytes — which was much more 
than | was hoping for the whole machine! — so | also posited that the LCD would have 
some kind of memory built in, but would still be fast enough to do 10-12 frames of 
animation. 


This is still not much of a display with reasonable resolution of about 100 pixels per 
inch, especially with regard to visual angle. 


It occurred to me that windows on a pixelated display could be easily made to seem 
overlapped, and that refreshing a window that seemed to be occluded would seem to 
bring it up to the surface — and that this would be really great on such a small display 
because it would effective magnify the display surface available without introducing 
modes. 


A few years later at Parc we got to do an “Interim Dynabook" (the Alto, mostly 
designed by Chuck Thacker), and in 1972-73 we could only afford 128K bytes for the 
whole machine, but wanted to do a pixelated display not just for the future flat 
screens but because even with 1-bit per pixel you wind up with a general display — 
like paper — on which you can display anything. 


So we sucked it up and decided to have the Alto display be about 1/2 the million 
pixels needed: 808 x 606, and using about half the available memory. (Many tricks 
were used via the Alto’s display list to not have to use all of them all the time!) 


Two of the many kinds of graphical objects introduced in Smalltalk were “run-length 
bit-maps” and “two-bit bit-maps”. Both of these allowed arbitrary shapes to be 
composited on the screen (i.e. “images” for desk-top publishing are just windows with 
no boundaries showing, and with the effect of transparent surrounds). 


Meanwhile, back at the ranch ... (| don’t have room here to say much about “display 
languages" so will just point ...) 


We were all of a mathematical turn of mind, and earlier — while many of us were at 
Utah in the midst of many kinds of graphics that were fomented by Ivan bringing his 
grad students from Harvard out to “Happy Valley”, and this because John Warnock at 
Utah had invented the first really efficient way to do CT graphics — the desire to 
mathematise graphics was intense. The VR HMD used a home-built matrix multiplier, 
and — so sweetly — the matrices composed, so you could turn a whole chain of 
transformations into a single matrix if the math had been thought through and 
prepped. 


At Utah and then Evans & Sutherland, and then Parc, and then Adobe, John Warnock 
and a few others “refined the math” with a few intermediate languages to eventually 


produce Postscript. Among many good things, PS also introduced the ideas of 
masking and extruding, which provide a kind of generalized clipping. 


If Postscript could be run fast enough, it would be a great replacement for “bit-blt”. A 
first try at this was on the NeWS UI @ for the Andrew system at CMU, which just 
barely worked, but was the right idea. The Nile system by Dan Amelang is a recent 
beautiful and comprehensive way to produce graphical images and their 
compositions. 


€ Alan Kay 
‘1 Still trying to learn how to think better - 4y 


If | want to improve my piano playing capabilities, are there any 
mathematical patterns | should be aware of? 


John von Neumann said that mathematics is about “relationships about relationships’, 
and there are lots of these in most music: some of them are interesting, some even 
profound. 


But | think there are important parallels between poetry and music with regard to 
form. One is that a formal nicety is not enough by itself to cover for lack of an content 
idea. For example, there are fugues of great complexity that are empty musically — 
Fux was famous for writing some of them. And there are various patterns that can 
provide spice to a composition: 2 against 3, 3 against 4, hemiolas (displaced strong 
beats e.g. playing C E G repeatedly in 4/4 time), etc. There are a number of 
“unplayable” arpeggios e.g. in Brahms where the composer was trying to indicate an 
effect to be pulled off rather than a call for a literal rendition of the score. 


In other words, what is called “prosody” in poetry, theatrics, reading aloud to others, 
etc. is the center of playing any music. This is often easier in musical cultures that 
don't use detailed written notations, because the music — as with the poetry — is 
conveyed “orally” with the poetry/prosody intact. Learning to play music by looking at 
a score on an instrument with limited expression (like a piano) can result in mechanical 
playing and wind up “losing the music”. 


On the other hand, as with learning to read and write natural language, a fluent 
literacy in musical reading and writing is tremendously useful, especially on the piano 
with its vast literature of 10s of 1000s of pieces. 


It is common in theater to have “readings” where actors gather together to read aloud 
a play to each other. This requires the actors to understand the overall play, who are 
their characters, and what they are trying to do, both in the large and the small. 
Prosody is almost forced because no one wants to sit through several hours of 
droning monotones! 


The need for wind on wind instruments and the need for bowing on the strings, helps 
musical learners realize that the notes are not to be played as written, but as some 
kind of “speech” (sometimes really like speech in some baroque pieces), and especially 
as some kind of singing (most of the time). Notes are not in isolation but are usually in 
groups, with arcs and hesitations and perhaps little flurries (even while feeling an 
underlying pulse — which is not the same as the completely regular ticks of a 
metronome). There's “buoyancy” to it all. 


Another aspect of “buoyancy” in music is called “groove” in jazz. This is different from 
“being in the pocket” and “tight” but there are overlaps. Each of these are very 
important in classical music also. “Tight” means playing with another player where you 
start your notes absolutely together (and usually finish absolutely together). “Groove” 
is hard to explain because it is a kind of feeling of “oneness” and “rightness” (a single 
player can have “groove"). “In the pocket” is an overlapping idea of both “being tight 
with another instrument” and “being right with another player’. 


Great classical players have groove. 


As the composer Paul Hindemith pointed out, a piano can be played with a rake! 
There is no wind needed that requires breathing, no bow that requires changing. No 
pitches that have to be controlled. And there are lots of fingers doing lots of things 
somewhat independently. 


And there is the written music, which does not indicate very much about how the 
piece should be played. And there are exercises, which usually have very little musical 
content. All of this contributes to “mechanical tendencies” in the learner's playing. 


What are the remedies? One way to think about this is that you are encouraging 
multiple parts of your brain/mind to learn different things as you work and play on 
musical instruments. Over all is the larger sense of the piece — and this does involve 
the prosody and expression — but there are lots of technical things that do have to be 
impressing on the parts of your brain/mind that can do complicated things but can't 
think about them well. 


So: IMO singing is the very best thing any musician can do whilst learning. And out 
loud at first until you can internalize real singing inside. | also like to listen to 
musicians play pieces to get their gists — YouTube is just wonderful for this — and 
also to compare with the score to see how the scores only “hint” — like the script of a 


play — the art itself is created by the combination of a composer/writer and a player. 
This will help build one’s own conception of the piece at hand. 


| like to do the technical stuff in two parts (a) within the context of music (find tough 
stuff in a great piece of music and work on it), and (b) as repetitive exercises where the 
only thing is to get more even at various tempos (this is one of the places where a 
metronome actually helps). 


Finally, it's worth noting a few comments from Francois Couperin, one of the great 
keyboard composers from 300 years ago, in his book “The Art Of Playing The 
Harpsichord”. One was that the notated music was not to be taken at all literally, but 
just to be used as a guide. Another was the suggestion that children be kept away 
from the harpsichord for early study, but to put them on a clavichord, which is much 
more expressive. He also suggested that children be initially taught to play a number 
of pieces without using music, and to concentrated on getting them to sound musical. 
Then they can be shown what these pieces look like in score, etc. (This is rather like 
having language learning precede learning to read, but not by a lot.) 


As | look back on my learning to play keyboards, | would have done better if I'd stayed 
with pieces that | could “play the notes” on, and really worked further to get them to 
be much more musically expressive ... 


é Alan Kay 
~ 1 Still trying to learn how to think better - Updated 2y 


How did you lose the bad habits you gained from self-learning a musical 
instrument? 


See the great answers by Ben Simmons and Ed Durbrow. 


After playing many jazz gigs around NYC and Long Island, | wound up in the Air Force 
in Texas in the early 60s. | was not in a band there, but got myself a wonderful Gibson 
Johnny Smith guitar. | decided to take advantage of not playing gigs to completely 
retool my left hand technique, which had evolved to only use the pinkie for chords. 
For passage work, | only used the first three left hand fingers. This was definitely 
holding back further progress. 


This required about a year of doing about an hour of work every day, because it was 
not just scales etc., but | had to retool my brain for using the pinky automatically for 
improvisation. After Texas and the Air Force, when | started playing again, | found that 
much more work was needed for “playing live under the gun” to not slip backwards 
and just “stay on it”. 


Allin all, perhaps two years to get really comfortable again. Years later | read that it 
takes about two years to really ingrain a habit, and about two years to really undo a ha 


€ Alan Kay 
© 1 Still trying to learn how to think better - 4y 


Among the best guitarists in the world, why aren't there any women? 
Originally Answered: Among the best guitarist in the world, why aren't there any females? 

There are lots of well known female “best guitarists in the world” in the classical guitar 
world, including with truly incredible chops of the kind valued in the popular and rock 
world. 


But classical guitar is a very extensive art and passion — it is lots lots more than just 
chops, etc. — and it includes deeply contrapuntal music as well, which is quite 
challenging for guitar given the way it is laid out and played. | could choose from 
many hundreds of examples, but | was just sorting through arrangements of Bach's 
famous “Jesu, Joy Of Man's Desiring”, traditionally played at Christmas time. 


The piece is a decorated chorale with an instantly recognizable accompanying theme 
in triplets against a slower chorale melody in 3/4 time, and was originally part of a 
cantata for voices and orchestra. It is been arranged for many combinations of 
instruments, and many times for guitar. 


The arrangement | like the most is David Russell's, in part because — unlike most of 
the other arrangements — he successfully combines the two themes really beautifully 
(and ingeniously!) in a number of places just as in Bach's original arrangement for 
choir, strings and winds. 


This results in a piece that is extremely difficult to pull off smoothly, and make it 
sound buoyant and natural with no sense of strain. One of the problems is that the 
slower chorale melody in the upper voice really can’t always be played legato given 
the demands of fingering the triplets melody and the bass (but it still has to have 
“line” and to sound connected so that the actual disconnects don't seem to happen ...) 


Even David Russell — one of the great virtuosos of our time — has a little bit of a 
struggle with his own arrangement. But how about listening to Kaori Miraji, playing 
the very same arrangement wonderfully and getting the sense of the full sound of the 
larger ensemble? 


This is like magic (especially for any guitar player to hear and ponder)! 


Alan Kay 
Still trying to learn how to think better - 4y 


How can | improve my piano playing and learn to play faster sections? | can 
play them slower than proper temp but cannot seem to increase speed to 
‘tempo’. 


Here's where a little more info from you about where you are would be helpful. 
Metronomes and gradual increases are generally helpful at any stage. 


But there's another aspect to fast playing, and that is the mental transfer from thinking 
about each note to thinking about “necklaces of notes”. This is because fast music can 
have more notes per second than the thinking part of our brain can deal with, but we 
have other parts of our brain that can execute very fast sequences as “single things”. 
Thinking about each note creates a barrier. 


One of the best ways to add this “chunked” kind of playing is to pick a metronome 
speed that you can play at, and periodically just double the speed of a bar — no 
matter what happens. This is called “burst mode”. 


The basic idea is to start training the parts of your brain that are going to carry out the 
“string of pearls” passages. Doing slow practice a fair amount of the time is important 
because you don't want to learn a bad version of the fast stuff. 


| found that playing two handed scales in octaves over four octaves with different 
metronome tempos — starting pretty slow — but sticking in burst mode for one or 
two of the four octaves, gradually got me to “release” from worrying about each note 
while still paying attention to pulse and connection ... 


| don't know a good book about keyboard technique that would help (it's been many 
decades since | studied keyboards), but a book that is popular with classical guitarists 
is “Pumping Nylon” by Scott Tennant (an excellent pro who teaches at USC). 


é Alan Kay 
<7 Still trying to learn how to think better - Updated 4y 


To what extent was it possible to build a digital computer during ancient 
Rome? 


This is a very interesting question — right up there with one that | used to put to a 
class on computer design: "What computer would Captain Nemo have designed for 
the Nautilus? — Sketch out the design for this!” 


The current question can be taken in a number of ways: including the meaning “With 
the technologies available during that time could a complete genius — beyond 
Leonardo — have figured out a device that we would call a digital computer?” 


This way of posing the question also allows for “Greek thought” to be employed, since 
the Romans had many Greeks — free and slaves — who did some of the deeper 
thinking for that practical minded people. This is a good combination because the 
Greeks didn't care much for Engineering, and the Romans did — many of the most 
interesting mechanical inventions of the Greeks were toys, especially by the 
Alexandrian Greeks. 


Let's leave aside that one of the problems of slavery is that it removes one large 
motivation for inventing mechanisms to do much general work, including calculations. 
And let's leave aside the question of “build-up” of precursor ideas, which even 
geniuses need and use — the question is about “can something be built?” at a given 
time in history. 


Some of the other answers have correctly noted that both the Greeks and the Romans 
calculated using a kind of abacus @ (the word “calculus” is the term for the stones 
moved around on their counting tables). It's important to note in passing that abacii 


*do* have “zero” implicit in how they work. 


Reproduction of a Roman “pocket” abacus that would fit in a modern day shirt 
pocket. The “calculi” here are not stones but beads in sliding slots. 


As a computer guy | was naturally quite interested in Babbage (and Ada), and was 
initially very impressed with the inventions that he had to do as precursors (like careful 
mechanical drawing, the turret lathe, etc.) and the inspiration of the Jacquard Loom. It 
was fun to get to crank the reproduction of the Difference Engine at the London 
Science Museum (super high precision construction, but a lot of a backlash and effort). 


A few years later | had occasion to really look at the earlier Jacquard Loom, and was 
completely knocked out by its astoundingly superior approach to mechanical design; 
even the huge mechanical action pipe organs of the time were not as wonderfully 
done! 


| realized that Babbage had taken a very poor engineering approach — he was a 
mathematician, and tried to do things “exactly” — and really should have looked at 
the actual mechanisms of the Jacquard Loom much more closely. 


Only the top 1/3 is the Jacquard mechanism. The whole loom is mostly wood 
with a little mechanism (including a lot of logic made from “coat hangers”). 


Jacquard went completely in the opposite direction: everything is just flopping around 
except and only except when he needed precision (a lot of the precision he got was via 
“guide holes” that would constrain the floppy things but let them flop on either side 
(one way to look at this is that the guides were to “reduce errors” periodically in the 


chains of causes and effects). 


The result is that there is virtually no propagation and multiplication of error (in 
Babbage’s scheme you get enormous propagation of error). One of the many fun 
things about Jacquard’s approach is that the mechanism does not require inventing a 
new kind of loom: the Jacquard apparatus sits on top of an existing loom and simply 
pulls the threads according to the program on the punched card chain. 


So one person operating a Jacquard Loom could affect thousands of threads via the 
punched card programs with human power alone. 


What's interesting here, is that there is nothing tricky in the Jacquard Loom 
mechanically — large parts of the logic can be built from coat hangers! — and it could 
have been made using just the technologies available on either side of 0 BCE. Here is a 
short precis of some of the technologies available (7. 


The Wikipedia article on the Jacquard Loom © is a start, but lacks enough detail. I'm 
still looking for the best description of the Jacquard Loom mechanism online (please 
help!). What's important here are the actual details of how Jacquard saw to make 
selection not require careful precision except only where actually needed. For now try 
this pdf starting at page 5 . And try this YouTube video to see some of the different 
parts at work driven by one person. 


There are several books with useful details: Essinger, James (2004). Jacquard's web. 
Oxford University Press, Oxford, and especially: Bradbury, Fred (1912). Jacquard 
Mechanisms and Harness Mounting. John Heywood Ltd., Technical Book Depot, Halifax, 
Yorks. The best book | found was: (1888) The Jacquard Machine Analyzed and 
Explained, by E. A. Posselt. Thanks to the wonderful Brewster Kahle and his Archive.org 
(, this book can be found and downloaded via Google 


I think it would be quite possible to use modern thinking to design a programmable 
computer based strongly on the Jacquard mechanisms, and then to build it using only 
techniques, tools, and materials available ca 0 BCE. 


| will devote the next week to trying to find an understandable online account of how 
the Jacquard Loom actually works (it is almost always the case that museums — even 
“science and technology” museums — just exhibit artifacts, and do not provide 
explanations or demonstrations of *how* and *why* they work). 


Addition | 


| had forgotten that I'd written about Babbage for Quora, and mentioned there more 
detail about the Jacquard Loom. What's the best book about Charles Babbage in 
terms of technical detail? 


| found a pretty good YouTube video from the V & A Museum that includes a 3D 
animated explanation of the Jacquard invention. 


It seems to not have a sound track, but | think the whole visual explanation is pretty 
clear. You might want to run the animation at 1/2 speed to really be able to follow the 
cause and effect relationships. 


Addition II 


Marcel Levy asked a question that is worth putting in the main body here: “So you're 
saying it was not the practical side that was lacking but the theoretical one ?” 


I think: Yes. It's worth looking at some of the mechanical inventions of the Alexandrian 
Greeks, including: the pipe organ played with keys with wind stabilized by water 
pressure, etc. Also Greek theater mechanics, etc. And of course, the Antikythera 
astronomical calculator. 


Heron of Alexandria was the most interesting inventor of mechanisms ca 0 BCE, 
besides the pipe organ, there was reported an odometer that counted digitally, and 
most interestingly a programmable cart (a recreation here on YouTube). 


What else did Heron do along the lines of these “toys to amuse and amaze”? He had 
the idea of “programming”, and probably devised other forms of it for some of the 
automata he came up with. 


Looms had existed for thousands of years, and elaborate ones for hundreds. 


The Greeks (and the Romans following) realized that the cultural “official” way to write 
numbers was “inconvenient”: we can see their real thinking by looking at their 
calculating machines (the various abacii). 


We could definitely ask questions about “what did they really want to compute that 
was outside the scope of their abacii?” Babbage was driven initially by inaccuracies in 
mathematical tables ("| wish to God these calculations had been executed by steam!") 
Turing was intially driven by Goedel's results. Physical computing was driven by needs 
of warfare from WWII onward. 


They most definitely had the brains and much of the mathematical and physical 
outlook. It looks to me that they lacked the feeling of need more than anything else 
that would drive them to use what they knew to see how to make a machine to 
compute for them. 


é& Alan Kay 
“1 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 4y 


What is the most difficult concept to grasp in computer science? 


I'm not sure what “computer science” means to the requester. Almost 15 years ago | 
had occasion to give talks at a number of universities, and at each one asked the 
audience of students and professors for their definition of the term. Overwhelmingly, 
the definitions would be called “engineering ones", and just what “science” is actually 
about never surfaced. When | conduct an occasional class at UCLA, I've gotten the 
same response. 


So, one answer to this question could be “Computer Science itself”. 


However, what jumped into my mind was a substitute question: “What do you see the 
least understanding of in the ‘not-quite-a-field’ of computing?” 


The three things that hit me in the face — via 56 years of observations — are partly 
related | think: (a) the hold-over from the 50s of an intense focus on “algorithms and 
data structures” (as opposed to processes and systems), (b) the almost absence of 
“informed design” and “feel for scaling” (as opposed to taking the larger situations 
into account), and (c) the lack of “trying to help every kind of user learn more and 
become more empowered in computing itself”. 


None of these are fair as stated, because a few people *do* make careful systems 
composed of processes, *do* careful design for integrity and scaling, etc., and *do* 
care about lifting the society as well as just giving them servants. 


Still, | think it’s worth pondering just what could be thought about and done by taking 
a much a larger look at what computing is actually about, and what today’s world and 
societies actually need. 


I think this would illuminate many areas of (a) what didn't progress when it needed to 
as Moore's Law exploded and (b) what didn't happen in the understanding of how to 
make new media a positive force in human life. 


€ Alan Kay 
“7 Still trying to learn how to think better - 4y 


In a chord sheet, what does the 2 chords with slash like Am/F# mean? 
Playing them together on a piano makes a dissonance. 

It's worth thinking about this chord a bit. As the other answers indicate, the slash 
followed by a name means play that note in the bass. 


If you play the chord with normal Am spelling — A, C, E, F# — you can see that the F# 
is the “6th” position in the chord, and this would be called Am6. A nice sound! 


The inversions of this chord are: first inversion: Am/C, second inversion: Am/E, third 
inversion: Am/F#. 


But it is also worth looking at this chord as though F# is the root. In normal spelling 
this would be F#, A, C, E. 


So: almost an F#minor 7th (F#m7) if the C were a C#. So we can call this chord an 
F#m7-5 (or F#m7b5). You will often see this chord in jazz charts. If you play this in this 
spelling near middle C it will not sound at all dissonant, but vaguely mournful or 
ethereal. 


Now notice that the first 3 notes are all minor 3rds apart — F#, A, C — so it would be 
a diminished chord except for the major 3rd between C and E. In honor of all the 
minor 3rds, this chord is also called a “half-diminished” chord sometimes notated F#¢ 
— in other words a diminished ° with a slash through it. 


It is worth playing with this chord to find the places “where it wants to go” (there are 
lots of possibilities). Then you could look up “half-diminished” in Wikipedia for a more 
in depth discussion. 


One way to think about this is to ponder “where does the bass note ‘want’ to go?”, 
first looking at half steps and then whole steps. What if the F# goes to G? Does that 
make any sense? (what does it sound like?). 


One way to think about this is that the F#, A, C, E, is kind of a D9 (a V chord of G) 
without the D. So this could be a way to get to either G major or G minor. Can you 
think of other chords it could resolve to with G in the bass? 


Going the other direction, to F from F#, we could expand the E up to F, so the 
resolution would be to F major. We could also lead to a Bb/F or a Bb7/F, etc. 


As with other ambiguous chords, you can try other roots to see if they supply more 
perspective. If you use C as the root, you get: C E F# A. This could be thought of as 
C6-5 (you don't see this very often, but you will see its cousin C7-5 (C E F# Bb) 
everywhere because of its chameleon properties (of being two symmetric chords in 
two different keys at the same time) — can you puzzle this out? 


€ Alan Kay 

“1 Have designed a few programming languages - 4y 

What was the first computer you worked with, and in what language did 
you program it? 

| answered part of this question in a related question: When you were learning 
assembly language, what assembly language did you study? 


My first real computer was the IBM 1401 in 1962 in the Air Force. It was a “character- 
based computer” with 6-bit characters (short bytes) accompanied by a 7th bit, which 
acted as a “word-mark" (an 8th bit was used for parity checking). |BM 1401 - 
Wikipedia @ 


This allowed variable length “fields” to be defined just by setting a word mark at the 
left end of a sequence of characters, and the field would be addressed by the address 
of the right end of the sequence. 


The encoding of the 6-bits for data was "BCD" which mapped a column of a punched 
card into B A 8 4 2 1. This allowed the fields on a card to be mapped into memory 
using word marks (the cards would initially read into a set location in memory. 


r''sae we we lt td 
ss 2a. @ a 2 ' 


DIB ee pa2aaerrraBarrraaswBresssaggogassseeeaages2aagees22aae22I229gT TAIT. 
STIPSEEPEEEEEEE) LEEEEEe LEEELGiEE) PPP PPTL E Lies E EEE Lies tetris |) 


COCCeeeaQeeGeccars 


Pritia) DOSES EEEEIED, PERESSE ETS SEID) EE 0 Lee 


EEE |) | Dt | POUHTT EERE HOHE EORED 
,. Viteeeserseraste PUTTS SSEEATE TT Pa LT 


The top two rows were mapped into the B A bits. The “numeric rows” were mapped as 
one number into the 8 4 2 1 bits. 


The word-mark bit scheme e.g. allowed two different sized numbers to be added 
together a character at a time by the ADD instruction with the two addresses of the 
low order characters of the fields. The first word mark encountered would stop the 
operation (and one can imagine the need to make sure the receiving field would be 
the right (longer) size. 


The word marks were also used to allow variable sized instruction formats that could 
make use of a computation “so far". For example, an "ADD" instruction could be in a 
two address, or one address, or no address format. Clever use here could allow very 
tiny programs to be written that did a lot. 


And from the systems point of view, the word-marks allowed a lot of extreme 
cleverness to be used on these very small machines (they went down to 1400 
character memory size). Memory management was a large part of writing an effective 
program for the 1401. 


At the Air Force Air Training Command computer center we had an 8K character 
machine (which was regarded as a luxury by 1401 programmers in smaller shops). We 
found that it was possible to write a kind of operating system and “overlord” that 
would fit just barely into the top 1K characters ... 


Systems programming on the 1401 was also aided by its first class tape drives, which 
were both very fast and had extremely fast rewinding speeds. We had 6 drives, and 
would often use one as a swapping (overlay) memory device. 


€ Alan Kay 
7 Still trying to learn how to think better - Updated 4y 


To what extent did the Xerox PARC Alto inspire the Apple Macintosh? 
Originally Answered: To what extent did the Alto inspire the Mac? 


| wish it had beyond gross appearances. The bitmap display and mouse pointing 
device were the most obvious inspirations. 


The Alto was intended to amplify a wide range of users. 


Some of the software on Alto — such as Smalltalk with its overlapping window GUI — 
were directly responsible for the overall look and feel of the Mac (and before that, the 
Lisa). 


Photo of Alto screen showing Smaltalk-74 right before Smalltalk-76 


The Alto — Chuck Thacker main designer — which first appeared in 1973, was a 
“parametric machine” of 16 virtual processors with 0-overhead-interleaving driven by 
microcode using an engine that was 5-6 times faster than main memory. The minimal 
display on the Alto was 808x606 (about 490,000 pixels). 


el 


Control =e 


Figure | -- Processor Date Patt 


C= ee 
The overall architecture of the Alto, with format of a microcode word below 


11 years later, the Mac was a rather underpowered Motorola 68000, whose CPU was 
along the lines of a mini-computer (like a PDP-11 but not a Data General Nova). The 
Mac display was 512 by 342 (about 175K pixels, a little more than just 1/3 the pixels 
on the Alto 11 years earlier). 


However, it is worth looking at the earlier Lisa, which was more of what people 
needed, but was about the price of an average US car (to me, a bargain for one’s 
information vehicle! but not to most consumers and businesses). 


One of the significant design strengths of the Alto was that the reloadable microcode 
allowed customized efficient virtual machines for very high level languages to be built, 
and be modified to introduce optimizations as they were discovered to be needed. 


The microcode also allowed new real-time display and sound primitives (such as bitblt, 
the universal screen painting primitive, 2.5D half-tone animation, and both 12 voice 
polytimbral real-time sampling and 8 voice polytimbral FM synthesis to be done 
several years after the Alto had been designed). 


And most of what is usually HW in all the peripheral controllers was simulated by the 
microcode (including controllers for the display, disk, keyboards, mouse, handset, 
sound, Ethernet, and up to 1000 input lines). This both saved a lot of money to make 
these machines, and also provided much more flexibility. 


Software is a very good idea! (But it also needs to be taken deeply into the 
“hardware"!) 


Taking the 11 year difference into account, the various computers designed and built 
at Parc in the early 80s still had a microcode architecture but were now able to use 
several of the AMD “bit-slice” CPU components. This would have been a great choice 


for Apple to take up — it would have revolutionized personal computing — but most 
of the Apple technical people were very uncomfortable with bases that weren't like 
mini-computers. 


Both Intel and Motorola could have made chips that were more “parametric” with 
regard to their function. This never happened despite the fact that most versions of 
the chips they did make had various kinds of fixed immutable “micro” and “nano” 
coding internally. 


Apple routinely blew chance after chance to have a dual strategy for the future by 
rejecting a number of opportunities to also have a line of workstations, which by 
Moore's Law would be Apple's future Macs 5-7 years out. 


€ Alan Kay 


Have designed a few programming languages - Updated 4y 


Is there static typing for a Smalltalk derivative? 


Besides looking at the past, we should look to the present and the future. 


The basic idea of “object” is to invest a “full virtual computer” so it can provide useful 
services via messages in a machine independent way and with enough encapsulation 
to be safe against attack. 


This also means that everything can and should be an object because (a) a full virtual 
computer can simulate any other kind of thing, and (b) because this also allows 
objects to be self-defining, self-documenting, etc. 


This means objects should be transportable across machines and requires the inter- 
object messaging to be done well (i.e. messages are objects, an object got to another 
machine by being sent as a message, etc.). 


An object can contain everything it needs above the simplest VM substrate. But what 
if we want our object to be able to deal with external resources? From inside the 
object we could decide that a particular external resource will have the local name of 
“foo”. We expect that “foo” will be bound to “something” that is exterior, but to which 
we can exchange messages, and both use — and perhaps provide — services. 


Now let us use the fact that we are 45 years removed from Smalltalk, and now have 
the large universal network that my ARPA/Parc research community was also working 
on. We first note that if we have a good idea of the receiver, then we can definitely 
send a message anywhere on the Internet, and this can work just fine. We will have to 
program in much less of a subroutine style than Smalltalkers did in the 70s (and 
probably still do now). In other words, we really are sending messages, and the replies 
might take quite a while, might not happen, etc. This is OK — the ARPA/Parc 
community was a “full-duplex” community and all of its network inventions were full- 
duplex. 


There's a lot to be said for full-duplex programming. One set of examples | like are 
from Erlang and its offspring. Using something like LINDA as a messaging “broker” 
makes sense, etc. 


On such a world-wide network, we might not know the local names of useful services 
and resources, and they might not know enough about our particular object to help 
much. 


| think this is a good place to ponder the kinds of “modern semantic typing” that are 
needed to deal with the enormous scales that happened, since we were able to do a 
pretty good account and demo of “personal computing” in about 10,000 lines of code 
in Smalltalk-78 on a Dorado. 


Let me start the discussion with readers, here, with three points. 


+ First we need to find the resources that would help us, and we don’t know their 
names. However, when we install “foo” we could associate with it the best set of stuff 
we can come up with to assist a local search engine to find that resource. 


One starting place could be with something like LINDA++ where we are matching 
descriptions using a more universal vocabulary. But | think we are beyond that point 
already. 


| think we should at least start with ideas such as those Doug Lenat had for AM 
(Automated Mathematician) and Eurisko (both of these systems were active and had 
lots of heuristic methods to help decide that “this thing is like that thing’). 


One thing that would be interesting to associate with "foo" is a simulation of the 
resource we are looking for. This is something that would be more semantic than 
pragmatic, but which could eventually produce typical replies to messages. 


We can now imagine that to bind to a resource, our object (and object system) will be 
able to negotiate actively with the environment of the object to get interior views of 
exterior resources. 


* Secondly we should imagine that when we are sending messages under the 
conditions of possible enormous scales that every message is not an encapsulated 
“telegram”, but has the potential of being a full-fledged ambassador, so that it can 
likewise take part in real negotiation of meaning at the receiving end. 


* Third we could imagine an extension of a “publish and subscribe” scheme as the 
general way to build systems. The basic idea here is a bit more biological and 
stochastic. The matching and negotiation processes would be used so that each 
object has two public billboards, one for “requests for resources, help, etc.” and the 
other “offers to the general good”. The semantic matchmaker will make useful loose 
couplings, and very large, very robust systems can be made. 


The only system I've seen over the last some years that “feels” like this will seem to 
have little to do with what I've been discussing. 


General Game Playing @ of Mike Genesereth at Stanford. 


But consider all the ramifications of what it means to write a program that is able to 
look at “requirements for a possibly very complex ‘game’ ", and to generate a process 
that can then synthesize a game playing agent from those requirements. 


In short, | like “types” if they can really pay their way by dealing with the semantics 
that really count! 


€ Alan Kay 
©7 Still trying to learn how to think better - Updated 2y 


Which of your guitars would you keep if you had to give them all up, but 
one, for the rest of your life? Let’s assume you own multiple guitars 
(acoustic, electric, nylon, steel, 12-string, etc.) 

Like J Robb Wilson, | also have a 1962 Gibson Johnny Smith that | got in 1962 while in 
the Air Force, and played it on jobs for many years. | kept it in mint condition. Hard to 
imagine parting with such a source of memories. 


Gibson Johnny Smith (not mine, but a spitting image from Google images) 


A really wonderful guitar in so many ways (basically Gibson's successful attempt to get 
Johnny Smith to endorse Gibson via making a pretty close approximation to Smith's 
D’Angelico New Yorker that Smith was willing to play in public). 


However, | think my Roger Borys B-120 is a better jazz guitar all around. 


Roger Borys B-120 (not mine, but identical style from Borys’ website) 


| got it about 10 years ago when | started playing guitar again, and noticed that 
several of my most admired guitarists around LA — e.g. Larry Koonse, Tom Rizzo — 
played Boryses. A perfect jazz “tenor jazz sound" created by the combination of 
special laminated sound board and custom designed pickup. Spiritually, | think it 
could be traced back to the classic Gibson E-175, Jimmy D’Aquisto, etc. 


As with most guitarists, | have a lot of guitars, including a number of really good 
classical instruments — a 2006 Jose Ramirez 1a, a 1950 Miguel Rodriguez, a couple of 
Kenny Hills, etc. There are two Kirk Sand “crossover” nylon string instruments that are 
superb. A Martin kissing cousin to “Trigger”. I've got two very good Telecasters (one 
especially good). | have an early original Steinberger “trans-trem”, an almost perfect 
electric guitar in almost all respects. Etc. 


But, about 5 years ago a variety of causes — including age, arthritis, etc — left me 
with a left thumb joint with nothing good between the bones of the base joint. It was 
not repairable and very painful and swollen, so | had to quit. 


Then a few months ago | was inspired by a great lute player — Thomas Dunford — to 
see what a lute was like to play, and found it was easier on the left thumb. 


One thing led to another to early 19th century guitars with short scales, small necks, 
lower tension strings, etc. | found that playing these with the fingertips like the lute — 
without nails — worked really well, and the shorter scale especially made all the barres 
much easier to do with very little strength. 


And this led me to Michael Thames of Santa Fe &, one of the premier classical guitar 
makers in the US, and who also likes to make 19th century guitars, including some of 
classic early Torres, and several of the earlier in the century instruments. | got to try 
one of his instruments and found the action to be amazing — this is related to him 
being a very good guitarist himself. 


This started a project for him to build me a “Mirecourt” style instrument ca 1820 with a 
short scale (620mm), narrow neck (48mm), and a few other ideas including two extra 
outrigger bass strings (which often could be found on these instruments back then). 
You tune them to notes “that will help”. 


Michael Thames “Mirecourt” 


I've now had this guitar for a little over a month, and | have to say that it is really the 
best guitar I've ever had the chance to play. Just amazing in both feel and tone. Just 

being able to play again after thinking | never would is a big deal that only musicians 
can really understand. 


This would be the one | wouldn't give up for anything! 


2020 Update — Except ... that my left thumb continued to get worse, and | wound up 
not being able to play “Mira”. But as this was happening, | accidentally attended a 
wonderful concert by the young lutenist Thomas Dunford, and was transported. | 
asked the local lute society about left thumb pressure, and they said much less. | have 
been happily learning it for about a year now. The thumb pressure is much less — | 
really have to watch it — but I'm able to play. The lute is much less like a guitar to play 
than | thought it would be. The increased possibilities for expression and the low gut 
frets and doubled strings require a lot more careful muscular learning, so my progress 
has been slower than | first thought. But it's quite wonderful. 


After trying and buying quite a few lutes, | was very lucky at the end of 2019 to have a 
truly amazing Renaissance style instrument come my way, made by perhaps the 
greatest living lute maker: Paul Thomson. 


I'm very happy to be able to play this (and am paying a lot of attention to just how the 
left thumb feels in the hope of prolonging the joy of playing). 


€ Alan Kay 
<1 Still trying to learn how to think better - 4y 
Did Xerox PARC write human interface guidelines? 


Yes. 


In the early days, we did many little test systems and many dozens of experiments, 
first on a special high-bandwidth and resolution “video generator” with a memory that 
could be used to store bit-map images (including for text characters). This experience 
was fed into the design of the Parc Alto, on which most of the subsequent 
experiments were done. 


Also during that period, William Newman and Bob Sproull (both at Parc) finished and 
published (in 1973) their book on “Principles of Interactive Computer Graphics", which 
contained pretty much everything known up to that point, including examples of most 
UI techniques (this is worth perusing to get a snapshot of that time). It didn't contain 
the bit-map graphics that was being developed using the Parc Alto. 


A few years later, there were a number of examples of “interesting” Uls, including the 
Smalltalk system, "Markup", etc. 


The two most interesting examples of “principled” human interface guidelines that 
were written up as design documents post the mid-70s were those for “OfficeTalk” — 
a terrific design for a hard problem, and very carefully thought through. Here's a 
description of this very interesting system @ by Skip Ellis and Gary Nutt. Others deeply 
involved in OfficeTalk were William Newman, Tim Mott, Dan Swinehart, Jeff Rulifson, et 
al. 


And a little later the UI guidelines for the Star computer design were composed (which 
were also very carefully thought through). 


| don't have the internal design document at hand, so I'm a bit fuzzy on all the 
contributors, but they included: David Canfield Smith, William Newman, Larry Tesler, 
Bill Verplank, Charles Irby, et al. The retrospective document is here /. 


€ Alan Kay 
<< Still trying to learn how to think better - Updated 2y 
What was the Whole Earth Catalog? 


It was a series of wonderful collections of “access to tools” especially including “tools 
for thought” of hundreds of books worth reading (the first 500 books to go in the 
Xerox PARC library were *all* the books in the Whole Earth Catalog). Much of its 
approach was systems-based, and this gives it a very future oriented point of view. 


The Last 
Whole Earth 
Cata log 


Rr 


ww 


wy ; 
ay 
ee 


There are also hundreds of physical tools of every kind, many for self-sufficiency, 
housing, farming, making, etc. 


io —— +- oo = Oe 


It first appeared in the late 60s, continued through the early 70s, and had a very large 
positive effect on the thinking of both “the 60s" and of the ARPA/Parc research 
community that was in the process of inventing much of the interactive computing 
and networking technologies that we still use today. 


It was primarily the inspired brain-child of Stewart Brand. 


The high point was "The Last Whole Earth Catalog" of June, 1971, which won the 
National Book Award. (Of course there were further editions ...) 


| like to have one of these around in both Los Angeles and in London ... 


Almost all of these books are worth reading today, as is the Whole Earth Catalog itself. 
(It would be nice if the Wikepedia people had studied it and learned from it.) 


But the Wikipedia article on it is well worth reading: Whole Earth Catalog - Wikipedia 
a 


And most of the editions of the Whole Earth Catalog are scanned online with pdf 
versions, so you can download and read them. Back Issues - Whole Earth Catalog @ 


It is worth perusing the entire Whole Earth Catalog website. 


Added July 2021 — this website now seems to be down or missing — but thanks to 
Frank Heile (in comments below), here is a link to the pdfs 


http://charlesbroskoski.com/_/view.php?id=whole-earth-catalog 7 


€ Alan Kay 
Still trying to learn how to think better - 4y 


What are the successors / continuations of the "talking typewriter" 
project/idea referred to by Alan Kay? 
| was hoping someone would answer this so | could find out! 


The question references a paper | wrote in 1972. About 5 years ago | was asked to 
write an essay on “The Future Of Reading” @ in which | referenced both the original 
“talking typewriter” for children by O.K. Moore (a terrific thinker), and a follow-on 20 
years later in the 1980s done by John Henry Martin with IBM. 


| also discussed some of the things that could be done now and in the future, but | 
wasn't aware of contemporary work that was actually making systems that could be 
used by children. 


Any one out there aware of next steps with children learning how to read by learning 
how to write with a “responsive writing and reading machine"? 


€ Alan Kay 
Still trying to learn how to think better - 4y 


What did ARPA-IPTO / PARC learn about how to do good problem-finding? 


Interesting Quora glitch (this question didn't automatically get sent to me — | just 
found it randomly). 


JCR Licklider ("Lick") had a nose for large visions (you can experience his early reaction 
to interactive computing in his 1960 paper “Man-Computer Symbiosis”). This plus the 
fact that he was well regarded by a number of important people in the DoD, wound 
up with them — in 1962 — giving him quite a bit of funding by computer standards 
to set up IPTO (the “Information Processing Techniques Office”) within ARPA (what we 
just call “ARPA” today with regard to computing research). 


When asked what he was funding, he would just articulate the one sentence version of 
“the vision”: The destiny of computers is to become interactive intellectual amplifiers for 
all of humanity pervasively networked world-wide. 


One of his ideas was to not have the ARPA-IPTO office come up with “goals” etc. (he 
used to say: you can't have good ideas within the Beltway (still seems to be true!)). 
Instead, he would try to find and fund “great people” who would go to the next levels 
of “problem-finding” and “goal-setting”. 


For example, by 1962, Doug Engelbart had spent some years putting together a 144 
page overview and plan “Augmenting Human Intellect: A Conceptual Framework" that 
still today is quite an amazing collection of important ideas, perspectives, things that 
need to be done, etc.* Engelbart was originally funded by the Air Force, then NASA 
(by Bob Taylor who went on to be an ARPA funder and who later set up Xerox Parc). 
But Doug got his main funding from ARPA-IPTO. 


Engelbart was a really good “problem-finder”. One can learn a lot by really 
understanding what was shown 6 years later in “The Mother Of All Demos” (MOAD) in 
1968 (just celebrated the 50th anniversary of this on Dec 9th). This is critical because 
one will note — as Bret Victor pointed out in a reflection about Doug in 2013 — that 
both the intent and much of the reality of the demo was not really a precursor of 
today — much of it went considerably beyond the watered down stuff available today. 


But then by reading the above document, one can see just how much more he — and 
they — were thinking about that didn't get implemented and shown in MOAD. 


Let me just offer a few properties of how the ARPA/Parc research community did 
“problem finding". 


= 


. the funders were willing to fund “problem-finding”. It has to be done 
sometime, somewhere, and it requires support, and often a fair amount of 
work and time 


Nm 


. the researchers were not initially concerned about whether something could 
be immediately implemented. This opened up thinking to a much wider 
range of possibilities, and “final goals” could be 30 or more years off. 


Ww 


. a lot of the thinking and explaining were done in terms of detailed 
“scenarios”: stories and walk-throughs of what the future could be like with 
the new ideas. For example, there are two such scenarios in Engelbart's 1962 
framework, one of them quite detailed and long. 


4. if a scenario wound up being “really desirable”, then the questions about 
feasibility could be extended outwards in terms of decades — say 30 years. 
Would it be possible then? 


uw 


. If so, then the idea could be brought back from the future to see what part 
could be done 10-15 years out, and if that had substance, then one could 
use Moore's Law (1965) and its precursors to see what could be done from 
the present to about 5-8 years out using “supercomputing”. 


a 


Much of the early 60s ARPA research was done using very large computers 
with single users to manifest the ideas as real-time interactive systems. The 
ones of these that were good intensified the desire to make them practical. 
For example, Ivan Sutherland's “Sketchpad”, RAND's "GRAIL" system, 
Engelbart’s "NLS", etc. 


tou 


. A good modern example is Bret Victor's "DynamicLand”, which he gestated 
for many years with detailed scenarios before taking a shot at putting a 
group together and making a subset of the ideas into a real-time working 
version. This is quite a wonderful example of how to do “vision-making” and 
“problem-finding”. Bret needed the great funders of the past (and that kind 
of funding doesn't seem to exist today). 


~N 


* the original scan can be found by typing "doug engelbart augmenting human 
intellect pdf" into Google (e.g. Bret Victor carries it on his site) 


What is creativity? 


Creativity is seeing wonderful things that most people can't see. 


€) Alan Kay 
+ Still trying to learn how to think better - 4y 


How did you learn to play the piano with both hands independently? 


Alan Kay 
Still trying to learn how to think better - 4y 


When | started playing keyboards as a kid | was exploring how chords and melodies 
went together, and the approach suggested to me was to learn “the three chords” as 
three note chords with minimal movement of the left hand, and see how these would 
work with many melodies. 


My mother had a Baldwin “Orgasonic", and its primer had the following shapes for 
“the three chords” in the left hand in the key of C: 


c abt La a ba E 


| lik | F (the tv chord) 


¢ pierce) ices 


WW cr... 


edad La bat Ls BCDE 


This had minimal finger movements so was easy to memorize and produce. 


C ithe t chord) 


Then there were lots of pieces in the key of C with these chord shapes that one would 
play with both hands. (The Baldwin primer suggested that one should “very soon” 
incorporate playing the root notes of the chords with ones feet.) 


There are lots of things that can be learned about important parts of music and 
playing by expanding on this. | used it a lot when learning guitar and figuring out 
chords and how they work in all the keys, etc. 


Some years later when | was learning piano “for real”, | asked a teacher about good 
ways to get get sight reading down, and he suggested “the 5 foot stack” of 
“everything”, but to start with Hymn books with 4 part arrangements, and to spend 
15-30 minutes each day just “moving on through them”. Eventually, one can “see” 
how they will sound and one’s fingers will all go where they need to. (This is very 
much like the left hand idea above, but for both hands playing shapes, and more 
kinds of shapes.) 


€ Alan Kay 
<< Still trying to learn how to think better - 4y 


I can tune a guitar by ear without any reference notes. Does that mean | 
have perfect pitch? 


“Perfect pitch” is actually “good pitch memory". 


We know this from (a) accounts of musicians in the past who could tell or sing pitches 
at will, (b) combined with the knowledge that the “standard pitch” in the past was 
different at different times. 


For example, in the 17th century there were a variety of standard pitches for A 
(including the nice low one of around 392 (some organs even at 380), the “baroque 
pitch” of around 415, the high pitches on some organs of 467 (or higher, maybe to 
save tin!). 


The pianos in Mozart's time were around 430, and piano tuning in the late 19th and 
early 20th centuries was set to 435. 


435 is a little flat from 440, but 415 is a half tone flat, and 392 is two semitones flat 
(the A sounds like a G today). 


Pitch memory can be learned by those who don't just take to it as children. There are 
various courses available. It was the fad when | went to music camp in the 50s, and | 
learned it. But it was quite a bit more trouble than it was worth for playing jazz in bars 
with old pianos that had been allowed to slip, for playing transposing instruments (like 
trumpet, clarinet, etc.), and for baroque instruments such as harpsichords. In all cases 
the extra interference wasn't worth it, so | unlearned it. 


Most trained musicians prefer the development of acute relative pitch. This is a 
floating version of “perfect pitch”. There is a reference pitch, and all the other pitches 
are identified by remembering “where you are”. 


This includes “hearing the notes” as one looks at sheet music. With relative pitch one 
“‘see’ what things will sound like" (a somewhat bizarre phrase in English), and 
there is no problem to move from a piano at A=440 to a harpsichord at A=415. 


In some systems of musical teaching, especially for children, a relative language for 
pitches — called “solfeggio” — is taught. This is “do,re,mi, fa, sol,la,ti,do” etc (with 
additional syllables for the sharped and flatted notes). 


There are two very different uses of this. Many European countries identify the 
syllables with notes on the piano “do” would be "C" etc. 


For much music, this is much less useful than to identify “do” with the “tonic” note of 
the key one is in. In the relative use of solfeggio, “do” would be G in the key of G, D in 
the key of D etc. This allows one to “see/hear” where the notes are relative to the key 
one is in. 


This covers a lot of music. But there is also music that is not in terms of keys, and for 
this more general schemes are needed. 


é Alan Kay 
A Still trying to learn how to think better - Updated 4y 
What are your reflections on “The Mother of All Demos”? 


I've been asked to give talks at the the 25th, the 30th, and the 40th anniversaries of 
the 1968 “MOAD". 


Here are the remarks | made 10 years ago at the 40th anniversary: 


Recently, I've done two presentations for the 50th anniversary (a) on Dec 9th at the 
Computer History Museum via Skype, and (b) for a Japanese conference 
commemorating the demo on Dec 10th via Zoom (right now it is available via 
YouTube — my talk starts at 28:22 ©, is about 44 minutes long, and there is an 
interesting Q & A session afterwards). 


Bret Victor wrote a very good comment &@ when Doug Englebart passed away a few 
years ago. It contains the advice | always give to people about Doug: 


(a) don't get distracted by what seem to be similarities around today to what was in 
the demo (because the intent of Engelbart and his people was quite different and 
much more lofty than the commercial dumbed-down technologies that are prevalent 
today). This means that the meanings of the terms as used today are not the meanings 
for these terms 50 years ago. 


(b) to understand what Doug was out after, read the 1962 proposal that lays out a 
conceptual framework for “Augmenting Human Intellect” 7. 


€ Alan Kay 


‘1 Still trying to learn how to think better - 4y 
Musicians, how do you play your instruments for hours while staying 
entertained? 


To a musician, music is life itself. “Entertainment” doesn’t come close to the “merging 
engagement” one has. 


On reflection, the answer above didn’t take into account the range of “how life feels to 
different people”. So, trying again: 


To a musician, playing music has much of the joy of falling in love and merging with 
one’s beloved. 


é Alan Kay 
7 Still trying to learn how to think better - 4y 


Out of the inventions to come out of Xerox PARC, which were the most 
grossly caricatured by others and which were the most thoughtfully done? 


Does “caricature” imply intent, or should we consider it just an observation of the 


result of sometimes blind processes? 


It's hard to caricature the laser printer, and in fact the rush to make lower cost low 
printing volume alternatives — e.g. Ink Jet — resulted in pretty high quality devices for 
the most part. 


The progression of the Ethernet has been interesting, because current versions of 
networking have much more switching than the original Ethernet at Parc. | think most 
of these additions/changes have been various kinds of engineering decisions where — 
in small enough domains — various additional control does not get in the way of the 
“eventually perfect transmission” design aims. 


The main parts of the programming style that is called “object-oriented” today has 
much too much of “simulating data-structure and procedure” programming. | would 
call it — at best — “abstract data types” and not “in the spirit of ‘real OOP’ ". A ‘real 
OOP’ progression from Parc would very likely have been in the spirit of Linda, and with 
many of the elements that we find in Erlang, and moving towards a more declarative 
and knowledge-based system. 


But this would also retain the Parc ideas about “simulation” (fits in with “knowledge- 
based") and to use a full version of McCarthy's “fluents” to “also simulate ‘time’ ” to 
allow both state and “transformations and changes that can be reasoned about" to be 
present (| have written about these ideas elsewhere in Quora). 


An equally large departure from the Parc style of ‘real OOP eternally running’ is 
today’s reversion to (or retention) of “simulated IBM cards" for programs fed to 
compilers and loaders with from scratch systems builds (this is painful to see today — 
or any day — it misses so much). 


But, for a short answer like this, | think | would pick the intent of the GUI — and of 
ARPA/Parc’s intentions for personal computing itself — as the most misunderstood 
Parc inventions. | wrote an essay “in The Art Of Human Computer Interface Design 
(in 1989) that gave a few of the ideas behind that design. 


Basically, the idea was that a UI is not just about “control panels for nuclear reactors", 
but is “an environment for learning what can be done and how to do it”. This means it 
has to not just be explorable, but also experimented with, and this means that you 
have to provide really good UNDO facilities. It also means that the UI needs to 
manifest “things to learn and how to learn them”. 


It is critical to be able to integrate anything with anything when that will help the end- 
users. The Parc GUI was very different here: it had unlimited “desktops” that could 
persist over time, each of which could manifest and combine any and all of the useful 
objects in this system. There were no “apps” per se. 


The viewing mechanisms were general via “windows” having boundaries or not, and 
used as much for multiple perspectives as for simple viewing. There were no separate 
provisions for “pictures” etc. 


Several other important considerations were to be able to “pop the hood" of “power” 
wherever found, to understand it, and to be able to safely add and modify it. 


What could be done at Parc ca 1978 can be seen in this presentation using it that | did 
for Ted Nelson a few years ago 


There was never put together a great combination of all of the good ideas from Parc 
and Apple through the 80s, but Apple added quite a bit in many ways, while missing 
some of the good stuff that had already been done. (A Hypercard on a Smalltalk 
framework would have been a really great advance.) 


Today, what can | say? A smartphone won't teach you gestures that could make you 
much more powerful and efficient. Smartphones don't generally UNDO (you go back 
out to the top and start over). And so on for both personal computers and phones for 
many more dreary paragraphs. Most apps couldn't be more stovepiped and further 
from integrating with others ... and so forth. 


A simple way to look at it is that marketing people don't like learning curves, so they 
put “training wheels on the bike” and don't tell anyone they are on. So people 
optimize a “non-bike" in some of the most ridiculous (and worse: detrimental) ways. 
This results in something like this: 


but in the computer case, perhaps with square wheels or flat tires. (| would call this an 
“extreme caricature of ‘bicycle’ “!) Putting it on tracks where someone else has chosen 
the directions would be even more apt. 


The best way to learn “real bike” is with “balance bikes” such as: 


The important thing is to learn to balance and to turn into possible falls. Many “not 
quites” can safely be made because the feet are near the ground and can catch the 
fall. The less important part of bike riding — the pedals — are gracefully added later. 


You can understand something about humanity, marketing, persistence of bad ideas, 
resistance to good ideas, problems with teaching and learning, and how all of these 
relate to public computing by looking at how bikes and bike riding are still conducted 
today. 


So, the first question to always ask when doing UI design is “what is the ‘bike’ ?” (this is 
what you don't want to lose, and is the eventual aim). Then you can ask “what is the 
‘balance bike’ ?” (this is a real abstraction of the real thing that is a good place to start). 
If you start with “what can people learn?”, you might come up with a correct but 
terribly debilitating solution (as have so many attempts over the last several decades). 


| don't think coming up with “Guitar Hero-like” non-guitar Uls that claim to allow you 
to touch “guitar” is good for anyone, and in the large they are disastrous for humanity. 
A little fantasy is a good thing, but human destiny requires us to get better at 
constructing maps for “realities” that are better matches to our actual situations (and 
inventing and using user-interfaces that help us learn how to do so). 


€ Alan Kay 
Still trying to learn how to think better - Updated 3y 


Is "dark mode" (light text on a dark background) really better for the eyes? 
Originally Answered: Is a “dark mode" (light text on dark background) really better for the eyes? 


Note: | only answered the original question. Some other agency changed the 
question without my permission, and my answer below doesn’t apply directly. 


At Xerox Parc in the early 70s, the Alto’s video display could be used for either, so we 
did some tests. Those tests indicated less eye strain with predominantly white 
background under normal ambient conditions, because your eyes would retain 
roughly the same level of light-dark adaptation going from looking around the room, 
at paper, etc., and looking at the display. 


A dark close up display will induce dark adaptation while looking at it and light 
adaptation when looking elsewhere. The Engelbart researchers had earlier come to the 
same conclusion for NLS and went for black text on white screen (and pretty likely for 
the same reasons — they also thought of their work-station-terminals as being part of 
larger daily life). 


Bill English and an NLS terminal ca 1968 
So the general answer to this question seems to be "No", 


It is possible that in a very dark room that a mostly dark background with lighter text 
on a display might be easier on the eyes (but this would imply a kind of troglodytish 
existence that is likely not healthy for human beings ...) 


€ Alan Kay 
“{ Still trying to learn how to think better » 5y 


Why does the American educational system seem to lack confidence in 
teaching higher-level mathematics in primary schools, where in many other 
countries, math is embraced instead of avoided? 


“higher-level mathematics"? “in many other countries ‘math’ is embraced"? 


I'm not sure what the italicized terms are supposed to mean here, especially with 
regard to children. Let's go for the non-pop culture meaning for the first, and with 
Seymour Papert’s and Jerome Bruner’s intentions for the second. 


The pop-culture tends to use the terms for “turning the crank on calculations involving 
numbers — usually just arithmetic — and occasionally for getting stuck at algebra’. 


Bertrand Russell said that maths are “p implies q”; John von Neumann defined maths 
as “relationships about relationships”. We can use the idea that maths are ways to 
represent ideas and transformations so some coherent reasonings can be done in terms 
of the representations alone in ways that are completely understandable. 


Most school systems in the world do not teach children “real math" as practitioners of 
mathematics would recognize it. For one thing, the understanding part is almost 
always left out, and replaced by memorization of patterns to be recognized and rules 
to be followed. Different school systems carry this out with various degrees of 
effectiveness. 


A bug that has shown up a number of times when needed reforms are demanded — 
especially in science and math — is to try to teach children watered down versions of 
the adult subjects. A particularly egregious example was the "New Math" (starting with 
set theory) that came out of the educational reform movements in the 60s after 
Sputnik. Part of the confusion was the conflation between understanding and usage 
(mathematician Marvin Minsky quipped that the problem with New Math was that you 
had to understand it every time you did it!) In Daniel Kahneman’s terms, mastery of a 
subject involves being able to get fluent in both “Fast thinking” and "Slow thinking”. 
We can see this problematic conflation in parts of the “Core Curriculum” approach to 
“math” (especially arithmetic). 


In any case, the key question is how to get children going on the “real things”. The 
kinds of thinking that children can and can't do at different developmental levels will 
often mean that “what adults use” is not a good choice for children’s minds. Watering 
down loses “real”. The remedy is to invent “children’s versions of ‘real’ ". This is often 
not easy, but the process can be great fun to do (and a way for adults to re-engage 
more deeply with some of their favorite subjects). 


For curricula, both the slow-thinking “understanding the subject” and the fast-thinking 
“being fluent in the subject” skills need to be devised. 


Seymour Papert was an excellent mathematician who had also spent 5 years with Jean 
Piaget, and wound up with a very good nose for the kinds of things children could 
actually do at different developmental levels. His ideas revolved around inventing 
kinds of “real math” that children could do, and that some really profound and fun 


versions could be created (e.g. forms of calculus involving the differential geometry of 
vectors, reasonings in terms of sequences of actions, etc.) if the interactive computer 
was included in the “tools for thinking”. 


The power of this approach is still not at all understood, mostly because hardly 
anyone in the math teaching loop in the US understands “real math” — and those few 
who do, have difficulty understanding the minds of children. 


Similarly, Jerome Bruner challenged US education with “Any subject can be taught to 
anyone in an intellectually honest fashion if you heed their level of development"; 
another way to say what Papert advocated. Bruner backed this up by heading a large 
amazing curriculum design for “real science” for 5th graders ("Man, A Course Of 
Study”: real anthropology for 11 and 12 year olds). 


The basic idea is that “children’s versions of the subjects that retain the hearts of the 
matters” need to be invented for most of the subjects that are very different from 
commonsense usage. 


In the more recent “coding for all” efforts that are aimed at schools, many of the 
things that should have gone into the goals and standards — known from 40 years of 
past experience — were omitted because the committees doubted that teachers could 
understand them. In education for the benefit of children, putting adults first is a high 
moral crime! 


These efforts by a pop-culture involved with fads, desire for vocational certification for 
future jobs, and comfortable with (and even desiring) the substitution of labels for 
content, have pretty much killed every tinge of “actual education” in most areas, 
especially the ones that involve technique. 


A key idea that escapes the pop-culture (partly from blindness and partly via denial) is 
that many of the recent inventions that have changed our world require a qualitative 
shift in how the the world, and the knowledge about the world, is perceived (the $5 
dollar phrase is “epistemological stance”). Humans have been on the planet for several 
hundreds of thousands of years, so even the invention of agriculture is relatively 
recent (about 3% ago). Writing: about 1.5% ago. The modern idea of math dates to 
about 0.75% ago. Modern science: about 0.15% ago. Modern engineering is even 
more recent. 


We humans traditionally learn our epistemological stance in our world early in life 
from the surrounding culture and its impact on our ad hoc reasoning abilities via our 
genetics. There are various levels of commitments to these stances around the age of 
7 that get more difficult to change for most humans as they get older. 


When we talk about “humanity” the mass noun implies a distribution of variance, so 
the early commitments will be firmer or looser at the individual level — but it's a 
reasonable generalization to say that for “most” children “much" of their world view, 
values, beliefs, etc. are formed early in life. 


This means that for most children the most important education for a “modern 
epistemology” has to happen in the early years. And this means that the most 
important human influencers — the parents and teachers — need to embody the new 
epistemology. 


But generally they don't. And, since adults tend to try to recapitulate children in their 
own image, we get generation after generation of “non-modern” parents and 
teachers. (And voters.) 


In theory, teachers should embody the most important epistemological ideas in a 
culture. This would be a positive route to having the schools be a stronger influence 
for new ways of thinking than the home — and in a few generations to produce more 
sophisticated parents (and easier starts for teacher training). If a generation is about 
25 years, then about 9-10 generations have already gone by in the US since 
(especially) Jefferson advocated this as being a prime goal for public schooling (some 
progress was made but for the country as a whole, not enough). 


But this was the case for the teachers that Montessori trained, and it was one of her 
expressed aims that her schools embody the new ways of thinking as a culture ("You 
can't teach the 20th century in a classroom!"). 


This is not the case for US elementary school teachers (not even close). In the US, it is 
a commonplace notion that any adult knows enough to advise a 5 year old (this not 
remotely close to being true). In fact, in our world, the advisors for young children 
need to be the most sophisticated adults. 


A society that understands this can readily afford to pay for it, and get it. Ours doesn't, 
doesn't, and doesn't. 


€ Alan Kay 
-- | Have designed a few programming languages - 5y 


When you were learning assembly language, what assembly language did 
you study? 


My first real machine for programming (and learning assembly coding) was the IBM 
1401 in the US Air Force in 1962. 


(After subsequently learning a few more machines, | realized that there were a wide 
variety of approaches to basic architecture. This became much more standard 15 or 20 
years later, but | think programmers today would find the first three machines | 
learned to be surprisingly different from anything in use today.) 


In the early 60s most programming on most machines was done in assembler for 
three main reasons (a) they were slow and small and pretty much everything had to be 
optimized for speed and space (for example the 1401 we had was just 8K 6-bit bytes 
plus tape drives), (b) the few “higher level” languages around (like FORTRAN) that 
could do a good job of optimization took a lot of machine time to compile (c) most 
programmers thought of HLLs as “for sissies" (sound familiar?). 


Still, IBM was forcing HLLs, and this meant that it forced its technical people to 
implement both FORTRAN and COBOL on every machine it sold, no matter how small 
(being able to run compatibly across the many machine types was one of the big 
selling points for HLLs in those days (they weren't very "high level"). 


The assembly language used by the 1401 was Autocoder — a macro assembler — that 
| realized later was unusual in its powers when compared to most others. The big 
added feature in Autocoder was the ability to “tailor” macros. This was done by 
putting on each line of code little expressions that would be run at assembly time — 
most of them testing a macro parameter — that would decide to include the line of 
code or not. You could think of a 1401 macro as a “heuristic” piece of code that would 
expand into the minimal instructions needed to do the task given the parameters 
supplied. (This could get quite intricate.) 


Subsequent machines | learned — that required assembly language programming — 
were the Burroughs 220, the Control Data 3600, and the Control Data 6600 (the 3600 
being the most standard architecture). Then the DEC series of machines, etc. etc. and 
finally the Xerox Parc Alto, where we programmed its microcode in assembler. 


The big exception was the Burroughs B5000 (ca 1962) that had a HLL architecture in 
the hardware, had an HLL systems language (an ALGOL extended for systems 
programming), and was never programmed in assembly as far as | know. | learned this 
machine in the Air Force but never got a chance to actually use it (Burroughs was late 
in delivering, and | had moved on by the time they did deliver.) However, it became 
the basis for many “byte-code interpreters” for HLLs over many subsequent decades. 


The 6600 supercomputer in 1965 was perhaps the first RISC architecture, and whose 
inner CPU ran 10 times faster than the main memory (100ns vs 1 us) with parallel 
arithmetic, fetching and storing, and a small kind of cache for sorting out sequences. It 
had a very rudimentary assembler, and the programmers of this machine had to do a 
lot of work to keep the main memory cycling and the tiny cache not exceeded. It took 
quite a while for the FORTRAN on this very fast machine to create code that would 
even run faster over all than on the 3600. (The 6600 later became a favorite target for 
computer scientists trying to come up with general optimizers for arbitrary machine 
architectures.) 


€A Alan Kay 
7 Still trying to learn how to think better - Updated 5y 


What is the "proper" technique to play a classical guitar without fingernails 
(if there is one)? 


Some pointers. First take a look at lute technique over many centuries, and you will 
find that most lute playing of all styles and eras was done without nails — the main 
exceptions were a few players, especially those who mainly played “continuo” 
(background chords and figurations in groups), where more volume helped and less 
subtlety was required. There is quite a bit known — much of it written down — about 
how to get many different kinds of tone and volume on a lute using fingertips. 


Second, take a look at the early phases of guitar, especially the 6 string guitar starting 
around 1800, and you will find that most players played with their finger tips and not 
with their nails. For example, both of the two main virtuosos in the early 1800s — Sor 
and Guiliani — played with fingertips. The great player and composer of the late 19th 
century — Tarrega — started with fingertips, went to nails (partly for concert 
projection reasons) and then retooled back to fingertips for the rest of his career. 


Segovia was very opinionated, had a large influence, and castigated all players who 
didn't use nails. | think this set the current nails only idea today (and also was a big 
influence in the rather heavy, big, and loud instruments that are common today). 


| had to quit classical guitar about 5 years ago because of a ruined left thumb joint. A 
few months ago — accidentally — | tried a lute and found that the bracing pressure 
on the thumb was much less (and partly because of the shorter scale). This got me 
going again on shorter scale instruments, both lutes, and “romantic era” guitars like 
they had 200 years ago. | found quite a bit to learn about producing tone with 
fingertips from both lute players and the subset of classical guitarists who use 
fingertips. | am just a few months into this process, and there is much more to learn 


than | currently know. 


But: wonderful tones of wide varieties can be produced with fingertips, and quite a 
wide range of volumes. There is some overlap in the sounds that can be produced, 
and there are some effects that only can be made with one technique or the other. 
YouTube has some good examples. For example, take a look at the guitar videos of 
Rob McKillop. Besides some beautiful renditions of pieces, he also has tutorials about 
how to create great tone with fingertips. 


| should mention here that | got very uninspiring tones when | first started. Rob 
McKillop — who is a long time lute player as well as guitar player — says that it took 
him about 3 years to really develop his technique. 


Some of the lute technique does apply — especially having the right hand so the 
fingers are more along the length of the strings rather than the right angle that is 
normal for nails playing. A 45 degree angle is a good place to start. Having the guitar 
a bit higher helps get a better angle with the right hand and arm. 


Here's a very good lute player who is also an excellent early 19th century guitarist — 
Anna Kowalska — playing a waltz by Augustin Barrios using fingertips. She is playing 
on a copy of a ca. 1820 French guitar. You can see that her right arm and hand 
position is more lute like than standard classical guitar. 


As with the nails technique, one “pulls and slides off” rather than “plucks and pops”. 
The pulling and sliding with fingertips is done at more of an angle, and the pull is is a 
bit more upwards. | found that about two weeks to a month starts yielding some very 
good sounds, especially at lower volumes. This is a good place to start. | have been 
playing classical guitar “grilled cheese sandwiches” — the small Tarrega and Bach 
pieces, Barrios’ “Julia Florida”, “Les Baricades Mysterieuses”, etc. — i.e. the easier 
pieces one could play in a concert, but without excruciating technique required. 
Memorize these, and you can put all of your attention into listening to the sound, and 
trying things day after day. Eventually, the hands home in on producing much better 
sounds. 


Getting pearly sounds at higher volumes is possible, and I'm still learning how. As with 
the lute, the "good" fingers for tone are the thumb and middle; the index is better for 
the weaker beats. The ring finger actually works quite well for top notes of 4 note 
chords (especially if it has gotten strong from previous playing). 


We should not be surprised that different people have differently shaped fingers and 
pads, so part of the process is to discover what works for you (this can be tricky 
because nothing works very well in the beginning). One tip | got from a video of Rob 
McKillop is to try different nail lengths, even if just playing with the pads. This is 
because, for some finger shapes, the nail will support the pad for better (and worse) 
effects. | found that “not completely clipped off’ makes for a better sound for me. This 
requires a little work, but not in the same ballpark as the careful contouring needed 
for playing with the nails. 


A big discovery for me when starting to learn the lute was the increased deep sense of 
intimacy with fingertips — this even though when you play with nails you do start by 
contacting the string with both nail and fingertip. With just fingertips, you have the 
sensation of touching the string directly with both hands, and there is just music to be 
brought out in between. 


The other part of this is that both real gut and synthetic gut strings at the lower 
tensions and shorter scale lengths have special qualities for fingertip playing. Part of 
this is the nature of the material itself and the kind of tone produced, and part is that 
the texture of these strings is not super smooth like nylon, but is a little “toothy”. This 
allows a lot more control with the right hand of just how you are pulling the string and 
releasing it. And the toothiness makes a big difference with the left hand for getting 
much more “grab” and control of more kinds of vibratos than are easy to get with the 
smoother strings. 


To me, the main reason to deal with the much larger technical demands of fretted 


instruments compared to keyboards is that you can get much more expressive sound 
and ranges of sound. Fingertip playing really delivers! 


€ Alan Kay 
{Still trying to learn how to think better - 5y 


Would you rather have three $1000 guitars with various characteristics 
(Strat/Tele/LP/semi-hollow, etc.) or one $3000 top-of-the-range guitar? 
The question is sort of aimed at the electric (even rock) guitar world, but it might be 
useful for a perspective from the classical guitar side of things. An additional bit of 
context is that | have a “nice number" of very good electric guitars of various kinds. 


The first thing about high end classical guitars is that they also have “various 
characteristics", especially along the dimensions of tone and action, and to a lesser but 
useful extent in size, and even shape. Most serious classical players have more than 
one really good instrument for similar reasons to electric players, because certain 
kinds of music can fit certain instruments better than others. 


It's not that you can't get a very good classical guitar from a luthier for under $5000 
(you can), but here, let's start there and look towards to (say) $12K or so. I'm guessing 
that most classical players with this budget would go for one really great instrument, 
and for the very same reasons that violinists, pianists, etc. would. 


In this range, the price is not automatically a measure of the goodness of the 
instrument. This is because — as in the electric world — as in the violin world also, etc. 
— there are fads, famous makers, etc. The good news is that there are some really 
cosmic classical guitars in this price range, and some of them are closer to $5K than 
$12K. 


| want to echo and enlarge a point made by one of the other answerers: a really great 
piano (and not a concert grand either) these days is more than $100K (e.g. a 
Boesendorfer 200 and above). You can get a really great guitar for less than 1/10th of 
a great piano (and you can take it with you everywhere). This is one of the great 
musical bargains of all time. 


€ Alan Kay 
{I'm the "Alan Kay" in question (try Google for the usual misinformation) - 5y 


In the early days of the computer mouse, what were the arguments for and 
against making it one-button versus having multiple buttons? 

The mouse was invented by Doug Engelbart and Bill English in 1964 for their 
proposed “oNLine System” (NLS). There were a number of other pointing devices 
around at that time, including the light-pen/"“gun", and several kinds of track balls. 
Invented the same year was the RAND tablet-and-stylus (it was really a great piece of 
engineering and worked wonderfully well, albeit for a very high cost). 


The vanilla NLS input devices around the time of the “Mother Of All Demos” (50th 
anniversary on Dec 9th this year), were a typewriter like keyboard, a “chord keyboard" 
for the non-dominant hand, and a mouse with three buttons for the dominant hand. 


The idea of this scheme was that “all things should be doable” while navigating the 
hyperspace — “both hands out” — or typing large amounts of text “both hands in”. 
This way one did not have to switch back and forth between pointing and inputting. In 
practice, the middle typewriter keyboard was just used for bulk text of paragraph size 
and larger. 


All text and all commands could be issued with “hands out”. The five “chord keys” on 
the left plus the first two mouse buttons provided 7 bits of input (the system 
remembered the down keys — so you didn’t have to be perfect — and would send 
the whole byte when all keys came up). The right most button on the mouse (the 
user's index finger is over it) is “command accept". 


The command scheme was a bit more verbose than it needed to be (and there were a 
few reasons for this (another answer)). It was basically procedural with command verb 
first. 


¢ give command 


¢ give kind of entity 
¢ give other parameters 


* command accept 


There was a bit of a learning curve with this scheme, but you were rewarded by most 
executions responding in sub-seconds, so you could usually issue a few commands 
per second (worth comparing to the snail's pace of Uls today, except for scrolling, 
etc.). 


One didn't learn “the bits” but started with learning basic editing and navigation 
commands: Move Character <this one> <there> <accept>, etc. This would get you to 
learn “m" and “c". 


A few hours would allow you to do most things, and about 10 hours were enough for 
comfort. Adepts could type 20-30 words per minute with "hands out”. 


In practice, this ran afoul of “value uncertainty” for most outside users (i.e. they said to 
themselves “should | put in the effort to learn this?”, and most wound up saying “no”). 
This is basically the “violin problem”: it's incredible once you learn it, and very tough 
and unrewarding for quite a while. 


Even though NLS was a lot easier to learn than a violin (10 hours is really not a lot of 

extra time to invest), the “value uncertainty” was a big drawback in acceptance. When 
people complained, Doug would say “Well, people are going to be using stuff like this 
6 hours a day — they should learn an efficient interface”. Almost no one believed him. 


The later Parc Altos had both mice and the five finger chord keyboards. One of the 
motivations for designing a much simpler easier to learn GUI interface (originating 
with Smalltalk) was that we were working with children, and we had big goals for how 
they should allocate their mental resources between a UI and the content. (This is the 
scheme that Steve Jobs saw that was adapted for the Mac.) 


The object oriented UI of Smalltalk had selection of object first, then <command> 
(which also served as “command accept"). This was both modeless and more compact, 
and proved to be more intuitive for the children. 


A lot of experimentation was done with the three button Parc mouse. The Dynabook 
design required tablet/stylus (in part so the kids could draw things) but these were still 
really expensive, and the Parc “living lab” required things to be replicated for 100 users 
(for a personal computer, meaning: we needed to build 100 Altos*). So we looked at 
the mouse. The NLS mouse had real problems (used potentiometers, was noisy, etc.) 
so Chuck Thacker (I think) came up with a relatively small, very high resolution mouse 
using shaft encoders. This was really good, with the only drawback that it needed to 
be cleaned periodically. Eventually it was replaced with several varieties of “ball mice” 
and also Dick Lyon’s first optical mouse. 


There were numerous UI designs that used: 


© one button of the three button mouse (imitating a stylus/tablet). This was 
usually in conjunction with several different “gesture recognizers” like those 
in the GRAIL System at RAND 


¢ used two buttons (this was taken up by the later Star office system product). 
Basically pointing-and-clicking and “properties”. (Sound familiar?) 


e all three buttons (the latter was used by most of the technical people: one for 
pointing and clicking, and the other two to bring up menus). 


Mixtures of all three of these are still in use today. 


Note that a good gesture recognizer has some of the traits of NLS. Basically, the 
gestures allow one to be very fluent and fluid with the UI, at the cost of learning them. 
(Today is terrible, where the Uls use simple gestures and don't teach comprehensive 
ones — this is kind of for 2 year olds and 92 year olds, and no one in between!) 
Experience with both NLS and GRAIL convinced us that a system with a really great 
gesture and writing recognizer still needed a keyboard (hence the inclusion of a 
keyboard along with the stylus and touch-sensitive display in the design of the 
Dynabook). 


It's worth contemplating the tradeoffs between having multiple buttons on the mouse 
vs “command keys” vs. having something like the NLS chord keyboard for more 
possibilities. 


Where we are today is that most people use their systems much more than the 6 
hours a day that Doug predicted, but they don’t want to do any concerted learning. 
This gives the worst of both worlds, in that people put in amazing amounts of time 
avoiding the smaller amounts of time needed to learn a real skill (this is “Humans 
101"!) This is basically a bicycle with training wheels where no one realizes there are 
training wheels — so almost no one ever learns to ride a bike! 


And most of the Uls today don’t provide any learning curves for the smaller 
percentage who can see that learning real skills would make a big difference. 


G& Alan Kay 
1 Still trying to learn how to think better - 5y 
What guitar has the best tone? 


Let's leave “best” as subjective, and look at ranges of tone as played by you (or by 
some very accomplished player). 


| want to discuss classical guitars, but first a Telecaster story. Some years ago, more or 
less by accident | picked up a Fender Telecaster 50th anniversary instrument (ash) and 
loved both the action and the sound, so | bought it. Teles have a huge aftermarket so | 
bought another 50th anniversary instrument to use to play around with gear 
substitutions. To my great surprise the second one | bought was not in the same class 
of feel and sound, even after some work (it was still “quite good"). 


Now for classical instruments. Guitar Salon International in Santa Monica has an 
enormous inventory of high end classical guitars, including many important historical 
instruments, and all in great playing condition. They have a nice big room with a high 
ceiling and great acoustics, and they will let you try out guitars all by yourself for 
several hours at a time via an appointment. | have bought several very good 
instruments here over the years. 


Each player has an internal image of “kind of how” a piece should sound, and they will 
aim for this sound. So a particular player will sound like “themselves” from instrument 
to instrument (this is amazingly true even on a good piano). 


But once you adjust to this, the differences between instruments, and to the point 
here, between really good instruments is still quite large (similarly with pianos). With 
guitars, e.g. a Ramirez really sounds different than a Hauser. A both very different from 
(say) a Michael Thames version of a Torres “La Leona” with a “tornavoz" a kind of 
Helmholtz resonator made of brass that Torres installed in many of his instruments in 
the 19th century to reshape (marvelously) both the bass and the treble response. 


Some guitars want to sing (and have a lot of sympathetic ringing between strings) and 
others are more interior in sound (and need to be coaxed to sing). Some are bright, 
some are dark. And so forth. 


(With pianos, the differences between a Steinway, a Bosendorfer, and a Fazioli are 
palpable.) 


Classical guitars differ in the soundboard woods (spruce and cedar give very different 
responses) and there are now many kinds of “double tops” which have paper thin 
woods as the bread (maybe spruce for one bread and cedar for the other, with 
something in between, such as “nomex” or balsa wood). The latter tend to project a 
lot more but have a different tone. 


They differ in body woods (though Torres once made a great guitar with a papier- 
mache body to show that it was the soundboard that was doing the work). The 
bracing on the soundboard is another big alchemy. 


There is “tuning” of the sound board via graduated thickness. 


Finally, there is a very large range of kinds of strings that are available, from real gut 
(some of which is actually made carefully enough to be like the gut strings of old), 
synthetic gut of many kinds, nylon, “carbon”, and more and more (even Japanese 
fishing line has been used by one the great guitar players of our time!). 


And so forth. Many of these choices actually do make a difference with the sound that 
a good guitarist can coax forth. Most players use fingernails on their right hand to 
pluck the strings, but there are whole schools of thought about this. And some players 
prefer to play with fingertips (as many players over the last hundreds of years have 
used on the guitar and its ancestors, and also for the lute). The choices here will in 
turn be aided by the types of strings preferred, and these also affect the "raw material” 
sounds of the instruments. 


In my view the differences are quite large and important, especially if one is very 
sensitive to sound. 


€ Alan Kay 
© Still trying to learn how to think better « 5y 


What's a good introductory instrument to learn for someone who knows 
nothing about music? 


Let me suggest “the voice”. This is because “musicality” has to be developed internally 
for any kind of musical production to be “musical”. If you take lessons on another 
instrument, at some point the teacher will try to get you to sing what you are trying to 
play (and conversely, good players are always “singing inside” and use this to help 
expression on what they are doing externally). 


The need to take breaths and the connections with speech and prosody automatically 
provide the start of phrasing, and both ways to do it, and to ask good questions about 
it (when can | take a breath without breaking up the line?) 


t's also the case that most people can say and sing really complicated rhythms long 
before they can make them happen through an instrument. 


f you are playing an instrument that has flexible pitch — strings, winds, brass — then 
you need to hear what the pitch should be so you can make it. 


f you are playing an instrument that makes the pitch for you — keyboards, guitar, etc. 
— then you still need to be able to hear the sounds you should be making. 


f you are reading music, you need to learn to hear what “the dots” indicate (and with 
phrasing — as in reading in natural language). 


if you are playing by ear — you need to develop an inner ear. 


And so forth for many more reasons, perhaps the two most important of which are (a) 
“the music is not in the piano” — so it needs to develop in you, and (b) learning to 
sing will help you pay much more attention to what is actually going on in music (it's a 
lot more — and a lot better — than most people think). 


One final thought. One of the most noticeable things in teaching young children 
music is how musical most are via singing and dancing, and how introducing even a 
very simple pentatonic Kodaly instrument will cramp their musicality, sometimes to 
the point of discouraging for perhaps forever. 


So a really interesting question should be: what are the best ways to help people learn 
to sing both on the outside and especially on the inside? 


€ Alan Kay 


‘1 Still trying to learn how to think better - 5y 


Recently I've been considering trading the first “real” guitar | ever bought 
myself for a stratocaster, I've loved the sound of strats for so long but I'm 
scared to give away my first guitar (its a telecaster) Should | trade it? 


I'd would urge you to keep it (and make a plan for how to get a Strat). This is because 
“bonding” with an instrument is more than learning to use a tool — it becomes an 
extension of your psyche, imagination, even soul. You don't want to amputate 
anything that does this for you. But you can always bring in another that extends you 
in other dimensions and bond with it also. 


€ Alan Kay 
\- 1 Still trying to learn how to think better - Updated 2y 


What guitar surprised you the most (positively or negatively) when you first 
played it? 
Originally written ca. 2018 


I've played lots of different guitars over the last 65 years, mostly electric — especially 
jazz guitars — and somewhat intermittently — with some long stretches 
professionally, gaps for grad school and research (I'm the kind of player who has to 
practice a lot so gave up playing for a while) and then classical keyboards for years. 
Over the last 10 years it's been classical guitars with a recent 4 year break because of a 
ruined left thumb (age, arthritis, etc). 


Via trying a lute a few months ago, I've been playing again, now on shorter scale 
smaller 19th century style instruments with lower tension gut-style strings, fingertips 
instead of nails, and my thumb has just barely been tolerating this. But it's wonderful 
to be able to play at all. 


This got me to start looking for guitar makers who liked to be influenced by the 19th 
century — YouTube is a great resource. | found Michael Thames, who has been 
making a wide range of instruments for more than 40 years, many of which are 19th 
century influenced or “copies”, especially of instruments by Torres, who pretty much 
set the direction for classical guitar design after 1850. 


Many of these sounded fantastic on YouTube — but it's always difficult to tell, there is 
usually added reverb and equalization, etc. And a great player can really make a 
difference. Here's Matt Palmer playing a Michael Thames instrument that is a “copy” of 
a Torres “La Leona” guitar (using a “tornavoz", a kind of internal Helmholtz resonator 
that Torres pioneered to lower the resonant frequencies of a guitar and to make the 
trebles “sing” — it is a brass tube around the sound hole and you can see it in the 
opening seconds of the video). 


Just on the strength of these, | contacted Michael Thames (a very fun experience) and 
he was happy to start making me several shorter scale versions of some of these 
instruments. 


A month ago | finally got back to LA and visited the amazing “Guitar Salon 
International” in Santa Monica which has an enormous inventory of top quality 
instruments (and where I'd bought several of my classical guitars, including a Ramirez 
1A, and a 1950 Rodriguez). 


They had a Michael Thames “La Leona”, but in a standard scale (650mm) and a neck 
size that is too big for me now. | tried it anyway. 


This was a completely surprising shock. It sounded shockingly better than any of the 
recordings on YouTube (and “kind of cosmic” compared to very good classical guitars). 
And it played better by far than any classical guitar I'd tried up to then. The action was 
nicely low, but it was the “seatability" of getting a string down to a fret that was so 
much better than on other instruments. 


They let me borrow it for a few days, and that was a few days of heaven. The tornavoz 
— as invented by Torres and reinvented by Thames — is what every classical guitar 
needs. It brings at the same time to the bass: depth, clarity, fullness, and “largeness”. 
The treble is even harder to describe, but it is definitely “singing” in many different 
special ways. The recordings do a better job rendering the treble than the shock of 
what the bass actually sounds like. 


This is why | quote the word “copy” — this instrument sounds better than the original 
“La Leona” (that is still around today). | think Thames’ instrument is a combination of 
the best from the past and from what is known today (especially by him). 


It was hard to give it back to GSI. The extreme playability of this instrument almost 
disappeared my current thumb and strength problems even on the more difficult 
stretches. The sound was indescribably wonderful. 


I'm looking forward to the new guitars by Michael Thames — the first in a few weeks. 


Still, | have “that feeling” about the one | got to play for a few days. 


| wrote the above a few years ago. Recently, Jeff Dwyer posted an interesting 
comment that I'm putting here with my reply: 


Jeff Dwyer: | sympathize with you on the thumbs. I've got about 6 decades of 
playing behind me too, some professionally, and a lot of physically demanding 
sports and jobs. My thumbs are both trash. | get cortisone injections in both CMC 
joints, but they're not working so well anymore. The MCP joints are also gone. Looks 
like surgeries in the future. Anyway, | feel your pain. | can’t play for very long these 
days, and | can't reach the chords | used to without a lot of grimacing. But I'm not 
giving up, and | hope you don't either. 


Hi Jeff: | wrote the answer a few years ago — this is being written in June 2021. 
Cutting to the chase, the two guitars from Michael Thames were terrific, especially a 
“Miracourt copy” with two extra bass strings and a tornavoz (scale: 620mm). This is 
just a really great guitar in all ways. 


| was able to play this for about 18 months until my left thumb couldn't continue. 


Meanwhile, | decided to really learn the lute (it turns out to be quite different from the 
guitar in most ways, including how “specific” it is about pretty much everything on 
both hands — more like the violin in that respect). There is much less transfer from 
guitar than | thought. | also started to explore the lute instrument world, and after 
quite a few instruments, completely lucked out when a Paul Thomson 7 course 
Renaissance lute (1997) showed up (he is the “Strad” of lute making, and the one | 
wound up with can only be described as “luminous"). 


It is wonderful to be able to learn and play this at age 81! The lute has an immense 
repertoire over many centuries. | think it has many more “intermediate level” pieces of 
first class musical quality and beauty than any other instrument. It doesn’t produce a 
lot of sound, but it has more “presence” than instruments like the guitar, piano, 
harpsichord, etc. It does what a low level brass choir would do (where two notes 
around middle C will fill a room). Here's a sample of a “tone piece” from ca 1520s (so 
500 years ago) that | did as a birthday surprise for my wife last year (good 
headphones advised). 


» Alan Kay 
‘Still trying to learn how to think better - Updated Thu 


A 


Why are there frets on a guitar? Wouldn't a guitar player be able to play 
from hearing? They seem more annoying than useful to me. 


| would like to drill deeper into the issues than the other answers. 
Issue 1 - playing in tune at all 


There's the question of “what does it mean to play in tune on any instrument — 
especially where harmony is involved?” On instruments that are capable of 
microtuning on the fly — like the violin family, the woodwinds, the brass, and the 
human voice, etc. — ensembles of players with good ears will tend to play or sing 
notes in chords so that the reference pitch of each note agrees with the *harmonic 
pitch* of the fundamental note of the chord. Let's call that chord "C” major, and 
pretend that the reference pitch of “C " is 100Hz. Its *harmonic partials* will then be 
200, 300, 400, 500, etc. In the C major chord of C, E, G, we could choose the pitches for 
the E and G notes from the pitches of the harmonics in the root note C. So the 
reference pitch of the E would be an octave of the 5th harmonic 500 (so 125, 250, 500, 


1000, etc.) and the G would be an octave of the 3rd harmonic 300 (so 150, 300, 600, 
etc.). A closed chord C, E, G, C would then be 100, 125, 150, 200. 


This is so nice sounding it would be great to get all the notes in our scales from this 
principle. We could start with F (which in this scheme would be an octave of 33.33) 
and “run the 5ths” to get F, C, G, D, A, E, B, F#, etc. This will give us serviceable “white 
keys". Except ... 


We do this by multiplying by 3s. So F = 33.33...,C = 100, G = 300, D = 900, A = 2700, 
E = 8100. If we start dividing the E by 2s to get its suboctaves we find that the E 
gotten this way is ~ 506, not 500! It is inharmonic. 


But if we use the one that is harmonic with C (E = 500) it will then be flat for what the 
5th should be for A (E = 506). 


An even quicker way to see the larger problem is that “running the 5ths” (multiplying 
by 3s) cannot get back to an octave of the starting note (octaves are gotten by 
multiplying by 2s — and 2 and 3 are relatively prime). 


Something has to give. The nice thing about the microtuning instruments is that the 
players can adjust on the fly by listening to the harmony. So e.g.: a cello player will 
play the Bb on the top string in a slightly different place if the surrounding chord is a 
Gm or a Bb major or a form of Eb chord — this is so the harmonics of the 
conglomeration will fit together as nicely (as harmoniously") as possible. (The players 
are not thinking of the names of the chords so much as listening to see if the whole is 
“sweetly in tune”.) A little thought will reveal some pernicious cases, but these are 
smoothed out in practice. 


We can see that a keyboard instrument is a real problem with just 12 pitches available 
in an octave. The problem has been dealt with in 3 main ways: 


(1) restrict the number of key centers and tune the notes for those key centers (so e.g. 
you might be able to play in the key of C, but not in the key of C#) — retune the 
instrument for different pieces if needed. The former has been used for baroque 
organs and harpsichords, and the latter for harpsichords, etc. These schemes of 
“unequal temperament” have sometimes been used in compositions to express 
different feelings in going from smoother to rougher chords and back. 


(2) put more pitches per octave on the the keyboard (split the sharps, etc) 


— 


_— 


An Italian style harpsichord with 7 extra pitches per octave (i.e both F# and Gb are 
available, etc.) for 19 total pitches per octave instead of 12. 


This is a pretty good solution, and the split sharps are usually almost as playable as a 
single sharp. (Some historical organs and recreations even have some of the sharps 
split in the pedal notes!) 


(3) tune the 12 pitches per octave *equally out of tune* with each other to 
compromise (this makes *every* major chord out of tune, especially with 3rds that are 
much too wide). This is the solution most in use today — its main benefit is to allow 
composers to move key centers without getting “wolf” notes, especially in chords. 
Keyboard instruments with lots of harmonics in each note (like organs and 
harpsichords) produce a kind of a “grit your teeth” sound, especially if the music 
(typically baroque) has been written with much purer chords in mind (for example, 
many Buxtehude organ pieces want “sunlight” for their ending major chords, and the 
equal tempered version of these does not have this effect at all). 


We can see that fretted instruments (with — say — 12 frets to the octave) will have 
the very same kinds of problems, especially too wide 3rds and too narrow 5Sths if equal 
temperament is aimed for. Many of the early tunings for lutes were in the scheme of 
(1) above, and it was also with the lute that the first attempts at (3) were made. 


Some of the early fretted instruments — lute, vihuela, etc. — had frets made from gut 
and tied on — so they were moveable. This allowed the player to nudge a fret to help 
an interval (at the expense of other intervals). A favorite one is the equivalent of the 
10th between the low B on a guitar and the D# on the 2nd string — the fourth fret 
can be angled a little flatter at at the top to get the D# more in tune for pieces in E 
major and E minor etc (in the lute the B is a D and the D# is an F¥#, etc.). 


| couldn't find a picture on this pass, but some lute players would stick a little 


something to a string that would act as a fret when you pushed it down there behind 
the regular fret. This gave the player two pitches to choose from for that note ... (I 
haven't tried this ...) | think this has also been done a few times on guitars. 


On a guitar the frets are fixed, and you can bend strings sharp but not flat (so you 
could tune a little flat and bend to the better pitches — this makes chord tuning a real 
problem. Similarly, getting rid of the frets on a guitar allows microtuning but makes 
polyphonic playing in good enough tune very difficult (for most cases, more difficult 
than the net gain). 


But, some of the solutions to the keyboard can also be done on the guitar. Here is a 
guitar by a master builder in France that allows for adjustable microtuning, and 
especially for perfect thirds in certain keys. The little “fretlets” slide up and done in 
grooves. If this is “fixing some of the guitar tuning problems”, it is also an illustration 
of how large are the tuning problems on standard guitars. 


One of several solutions to “the frets tuning problem”. In this picture, the guitar is tuned 
to one of the (1) tunings which makes some keys really good and other keys not 
playable. 


Issue 2 — the problem of different diameters and tensions on fretted instruments 


Now we need to add the problem that the strings on a fretted instrument are not 
remotely similar to each other (and their partials are also not perfectly harmonic). One 
of the ramifications is that equal length strings of different diameters and masses will 
really be out of tune on any given set of frets. 


One of the compensations on a classical guitar is to angle the bridge. This doesn't 
work very well. More modern guitars will have both length and height adjustments 
individually for each string. (It's worth noting that this only works a little (the 
compensations will not work equally well for every fret). On classical guitars, the G 
string is just too fat as an unwound string (the partials are out of tune), and it has 
proven very difficult to make a wound version that will work better). 


Part of this issue is that the amount of pressure needed to get a string down to a fret 
increases as you go up the neck. This will sharp the pitch a little more and more, also 
adding to the out of tuneness. 


So: 


It is very hard to get a guitar into any kind of reasonable tuning! This will obviously 
bother some players very little (one hears really out of tune guitars all the time, even 
in YouTube demos by pros, etc). | think many players and listeners might even 
associate the out of tuneness with a kind of timbre. 


More musical players will have found schemes to compromise the tunings on their 
particular instruments, and will often compensate for the main harmonies they want to 
sound as much in tune as possible. Another compensation is to roll off the high 
harmonics — via nylon strings, the use of “jazz guitar boxes” designed to emphasize 
the tenor range, filters, etc. Use of vibrato and bending will get around some of the 
tuning problems. 


Still: 


The thing that is so wonderful about a guitar is how much expressive music can be 
gotten from just six strings in such an intimate setting, especially on an instrument 
that embodies some of what classical style instruments provide. | think the two main 
solutions that would help a lot in the future are: 


¢ Somewhat adjustable frets — with better mechanics for adjustment. | think 
this is a whole new area for design that has not been explored much at all 
since the time of the lute, and addressing the very same issues there. 


e Much better strings for both classical and electric guitars. In the 21st century, 
it should be possible to make materials that have the right strengths, 
densities, harmonics, etc. to provide a much more uniform response when 


put on an instrument. 
Added Nov 2023 
And: 


You could get a fretless guitar made, as Chet Atkins did and uses in this video with 
Jerry Reed and Pat Bergeron (also they get a great groove going in this one): You can 
hear the tradeoffs. 


Jerry Reed & Chet Atkins - "Summertime" (Live) 


€ Alan Kay 
+ Still trying to learn how to think better - Sy 


How did the discipline of science spread after its invention? 


A large question for a small space! On his landmark BBC/PBS show — “Civilization” — 
Kenneth Clark said: "People in the Middle Ages were as passionate about the Truth as 
we are, but their sense of evidence was very different”. The term “science” has meant 
very different things in different eras — it once mainly meant “gathering of 
‘knowledge’ " (which still requires a definition of ‘knowledge’). 


There is also the very large human problem of “cognitive bias" (lots of them), that 
includes “confirmation bias” (the strong tendency to see “evidence” that supports a 
pet theory, and to “not see” or downplay evidence against the theory. One way of 
dealing with that is to create a community, part of whose duties are to criticize and vet 
(or not) theories of others. 


Alfred North Whitehead pointed out that “The greatest invention of the 19th century 
was invention itself’. He meant that “invention” had become mainstream in a way that 
it most certainly wasn't beforehand. One way to see the big changes here is to look at 
the exponential rise in patents. Similarly, the idea of “progress” can be considered an 
“invention” of the 18th century: it wasn't that it hadn't happened, but most people 
expected to die in a world identical to the one they had been born into (except who 
might be king, etc.). 


Here we are talking about “invention”, and “progress”, and especially “science” as 
ideas in the larger culture. 


One large part of the epistemology of modern science is “making good maps", both 
actually and as an analogy for the whole pursuit. Maps in the Middle Ages depicted 
stories that people believed (they showed where the Garden of Eden was etc.) and 
they were not careful enough for navigation. When the Age of Exploration got going 
again (the Greeks and the Romans had been extensive explorers) really good maps 
were needed again, and this led to gradually being more careful, paying attention to 
evidence, using various kinds of reasoning and reckoning, special tools, etc. And 
criticism, and a community. 


Practical navigators did not expect that a great map would be completely accurate or 
comprehensive. They also did not use the Aristotelian two-valued logic (in which “not- 
True” is False”). The maps were “not-True” but accuracy counted, so there appeared 
“many gradations of “not-True”, some of them critically important. 


Most of philosophy — and many people trying to do science — thought they would 
get to “Truth” if they were careful enough; if " ‘God’ was a mathematician enough”, etc. 
So far this has been more of a snare than a guide. 


This leads us to two large inventions of modern science. The first orbited around 
Newton and was kickstarted by Brahe, Kepler and Galileo. Newton's “System of the 
World” results were so good, that they looked as though they might be “exact”. In the 
19th century, instrumentation and observation got so good as to find tiny but real 
discrepancies between the wonderful mathematics and observation, even before 
Maxwell's non-Newtonian models. 


At some point some people realized that they needed to deal with the hermetic 
nature of mathematics when attempts are made to model and deduce the physical 
world. Math itself consists of representations of forms and operations that preserve 
“truth in language” in careful ways (as Bertrand Russell pointed out: “p implies q"). To 
use this in science one needs to be careful about the mappings between the 
definitions of forms and operations to the phenomena we can glean by observation. 
The history of science is partly that of humans gradually finding more “out there” than 
expected. Right now there is no reason to expect that enough has been found to allow 
the universe to be deduced by reasoning. 


The second great invention of modern science came when science started to aim at 
“goodness of correspondence” between theories and observations, with special points 
awarded for good predictions, including ones that could find flaws in the theory. Just 
when is hard to pin down, but the 20th century is pretty coextensive with this way of 
looking at things. 


Also, for those interested in the philosophy of science, it is worth looking at the ideas 
of C. S. Peirce, William James, etc., who towards the end of the 19th century started to 


wonder if “modern science” could be defined well enough on pragmatic grounds 
alone. 


For a talk years ago | assembled and explained the excellent combinations of 
reasoning and observation of the Greeks during the Alexandrian ear that led to a 
qualitatively accurate picture of our situation in the solar system. This included dealing 
with similar triangles, the size and the shape of the Earth, lunar eclipses, and the 
attempt to make a triangle of the Earth, the moon, and the sun (which was “good 
enough” to get “a huge distance to the moon” and an “enormous distance to the sun 
and its size”). This was “real science” in every sense of the word, but it didn't stick. 


Was this because of the lack of a larger community? The lack of the printing press? 
The social and governance systems of the time? The disinclination of the Greeks to do 
mechanics except as toys? Did this correlate with slavery? 


In any case, it's worth calculating human lifetimes since the time of Kepler and Galileo 
— at 70 years per, we have only about 6 lifetimes since then! And about 5 since 
Newton's “Principia Mathematica”. A lot has happened in those 5 lifetimes (especially 
compared to the 1000s of lifetimes before). 


One way to look at this is that science has flourished via the combination of a few 
really unusual minds per century plus that a large percentage of more normal minds 
can learn and do what the unusual minds have turned up. This has allowed an 
enormous amount of work to go on in parallel without requiring a lot of serial 
coordination. It was exploding even before its impact on engineering and technology 
was appreciated. A good perspective on this is to look at the history of the physical 
sciences in Japan after the Perry expedition in 1853, and the Meiji Restoration in 1868 
that started the moderization of Japan. Just 67 years later in 1935, Hideki Yukawa, 
educated in Japan, published his theory of mesons, and received the Nobel Prize for 
this in 1949. 


One way to think about all this is that the “invention of science” really did take place in 
several stages, the most important of which were the inventions of perspectives on 
how to go about finding and getting and making “pretty good knowledge” and to 
share this knowledge in ways that transmit the deep foundations of what has been 
discovered (including new ways to discover). 


€ Alan Kay 
“7 Still trying to learn how to think better - Sy 


How would you tune a guitar to match a piano with stretched tuning? 
Would you tune to the lower bass notes being played or just the middle C? 


Robert Preston's answer is good. To understand what is going on, it's worth 
contemplating why would tuners stretch the octaves (to make an interval sharper than 
a perfect octave) on a piano? 


Seems like it would put the piano out of tune ... 


But it already is out of tune via the equal temperament system (which stretches the 
3rds and narrows the Sths). Also the extreme tensions on most of the strings produces 
“partials that are not harmonic” (meaning that the strings themselves are a bit off, and 
the partials, especially the octaves are not in tune — this produces beats and a sour 
feeling. The stretched octaves are done to try to compromise between the 
fundamentals and the partials for octaves (especially), and for other critical intervals 
(like 10ths).* 


Because almost all guitars have non-moveable frets and the modern instrument dates 
from the 19th century, which was very invested in equal temperament, the frets are 
similarly out of tune harmonically (the fourth fret produces a note a tad sharper than a 
harmonic third, the 7th fret is flatter than a harmonic 5th, etc.). Lutes have moveable 
frets made from string material, and one can often adjust these to get really good 
3rds, 6ths, and 10ths for important notes in a given piece (this can be heavenly, and it 
would be great to have on a guitar). 


Most guitar players who are sensitive to tuning will “tweak” intervals for a given piece 
(especially to try to get 3rds not too sharp). This is tricky because many notes on a 
guitar are available at the same theoretical pitch on four or five strings — but they 
may not sound exactly the same pitch. Also, via the nature of guitar strings, the short 
scale of the guitar, etc., makes it difficult for even fresh strings to really sound in tune 
(G strings are notorious for this). To all use the same frets, the bridge needs to be 
bumpy, and even this is not enough. Even just for tempered tuning, the frets on a 
guitar need to look like this: 


You can see that both the G and the B especially need very different fret positions (see 
“TrueTemperament %" on the net for more info). And all this just gets you good equal 
temperament tuning, not “perfect tuning”. 


So, back to Robert Preston. If you are playing with a piano, and you are sensitive to “in 
tuneness”, you will already be tweaking your guitar and grumbling, so including the 
extra tweaking to compromise with the piano is not a big addition to the overall 
chore. 


* | have a “fortepiano’in the style of 1795, and merely stretching the octaves is not 
enough (not a uniform approach to design in those days). We found that we had to 
tune it via 10ths to get something that would sound nicely enough in tune (especially 
for early Beethoven). 


Alan Kay 
Still trying to learn how to think better - 5y 


Which is more difficult: playing piano or playing guitar? 

Originally Answered: What is more difficult: playing piano or playing guitar? 

The answers to this question are like a cross-section of US culture. The one | thought 
closest to useful was by Robert Micarelli.Roberto Micarelli's answer to Which is more 
difficult: playing piano or playing guitar? 


Here is a different way to think about this question, and many of the conclusions can 
be reached even if you don't play an instrument at all. 


If we think about playing an instrument from the point of view of both the performer 
and the composer, we should not be surprised that both of them try to explore every 
part of what is musically possible. The result is that in the very real sense of what it 
takes to do something extreme, every classical instrument is as difficult as any other. 


| have played all the classical keyboards (with a main emphasis on classical pipe 
organ). And I've played most kinds of guitar (including jazz guitar professionally, and 
classical guitar avidly). The large differences between keyboards and the fretted 
instruments don't allow really interesting comparisons to be made. But — if a 
keyboard piece can be played on the guitar — the guitar can be made to sound and 
express almost cosmically more. This is because there are more expressive possibilities 
(the guitar can make many timbres for a note, it can vibrate and bend notes, etc.). 


But | don't think this is a centrally good way to think about this. Another way is to ask 
“how many hours of practicing” is it going to take to play such and such a guitar piece 
on guitar, and such and such another keyboard piece on a keyboard. If you throw in 
the side condition that the result should be “musical” not just “technical” then — in 
my experience — the pieces that are “difficult” to “really difficult” on each instrument 
wind up requiring commensurate practicing. 


The beginning experiences on both kinds of instrument can be quite deceiving. For 
classical music, | think one can get going more easily on a keyboard (piano or 
harpsichord) than on a classical guitar. (Violin has a tough startup period also.) At 
some point in the future after one has gotten through a few stages of fluency, they all 
start to feel about equally difficult, but each in their own way. People always ask me — 
especially pianists — if adding in the pedals on the organ for one or even two extra 
lines of music isn’t harder than piano or harpsichord. My sense is “no”, just different (it 
is rather like playing different things with each hand on a keyboard instrument — the 
two toes and two heels are basically adding one more four-fingered “hand” that is 
made from your feet!). 


A truly difficult instrument to play well is the cornetto — basically a hollow stick with 6 
holes and a very tiny trumpet-like mouthpiece. It is quite difficult to even get a note 
out of it. There are no resonances to help nail the pitches, every note has to be lipped, 
etc. There are only a few really good players in the world — but a great player can 
create a cosmic experience with the amount of expression that is possible. 


Having said all this about “all instruments being equally as difficult", | also somehow 
feel there is always more that can be reached for on guitar in terms of “mood”, “tone”, 


“expression”, etc. Some of my favorite pieces played by my favorite players are rather 
simple at the notes level (but are capable of deep cosmic expressive gestures). This is 
not a comparison with the keyboards (most of these dimensions are a bit more 
shallow on them). 


But just to give you an example, here is a classic piece from a century ago by the 
Paraguayan virtuoso Augustin Barrios — “Julia Florida” — a love song for one of his 
students. The score is pretty simple and rather slow in pace (it could be considered an 
“intermediate piece” in the terms of the notes). But it is made for the player to treat 
every finger on the right hand as a violin bow and to magically render something that 
is not just chamber music, but is even “orchestral”. 


| like Matt Palmer's version of this, and he is playing a truly great instrument in his 
YouTube recording by Michael Thames — his version of Torres’ “La Leona” guitar from 
1856. Getting all of this “out to us” so well is as difficult as any more obvious “zillions 
of notes” virtuoso piece. 


€ Alan Kay 


| read a lot - 5y 


Why hasn't Alan Kay published a book? 
Ouch! 


G& Alan Kay 
~ {I'm the "Alan Kay" in question (try Google for the usual misinformation) - Updated 5y 


What book(s) would you recommend to a computer science student? 


My basic answer is: read a lot outside of the computer field. 


It is worth trying to understand what “science” means in “Computer Science” and what 
“engineering” means in “Software Engineering’. 


“Science” in its modern sense means trying to reconcile phenomena into models that 
are as explanatory and predictive as possible. There can be “Sciences of the 
Artificial” (see the important book by Herb Simon). One way to think of this is that if 
people (especially engineers) build bridges, then these present phenomena for 
scientists to understand by making models. The fun of this is that the science will 
almost always indicate new and better ways to make bridges, so friendly collegial 
relationships between scientists and engineers can really make progress. 


An example in computing is John McCarthy thinking about computers in the late 50s, 
the really large range of things they can do (maybe Al?), and creating a model of 
computing as a language that could serve as its own metalanguage (LISP). My favorite 
book on this is “The Lisp 1.5 Manual” from MIT Press (written by McCarthy et al.). 
The first part of this book is still a classic on how to think in general, and about 
computing in particular. 


(A later book inspired by all this is “Smalltalk: the language and its 
implementation” (by Adele Goldberg and Dave Robson — the “Blue Book’). Also 
contains a complete implementation in Smalltalk written in itself, etc.) 


A still later book that | like a lot that is “real computer science" is “The Art of the 
Metaobject Protocol” by Kiszales, Bobrow, Rivera,). The early part of this book 
especially is quite illuminating. 


An early thesis (1970) that is real computer science is “A Control Definition 
Language” by Dave Fisher (CMU). 


Perhaps my favorite book about computing might seem far afield, but it is wonderful 
and the writing is wonderful: “Computation: Finite and Infinite Machines” by 
Marvin Minsky (ca 1967). Just a beautiful book. 


To help with “science”, | usually recommend a variety of books: Newton's “Principia” 
(the ultimate science book and founding document), “The Molecular Biology of the 


Cell” by Bruce Alberts, et al. There’s a book of Maxwell's papers, etc. 


You need to wind up realizing that “Computer Science” is still an aspiration, not an 
accomplished field. 


“Engineering” means “designing and building things in principled expert ways”. The 
level of this is very high for the engineering fields of Civil, Mechanical, Electrical, 
Biological, etc. Engineering. These should be studied carefully to get the larger sense 
of what it means to do “engineering”. 


To help with “engineering” try reading about the making of the Empire State Building, 
Boulder Dam, the Golden Gate Bridge, etc. | like “Now It Can Be Told” by Maj Gen 
Leslie Groves (the honcho on the Manhattan Project). He’s an engineer, and this 
history is very much not from the Los Alamos POV (which he also was in charge of) 
but about Oak Ridge, Hanford, etc and the amazing mobilization of 600,000 plus 
people and lots of money to do the engineering necessary to create the materials 
needed. 


Then think about where “software engineering” isn't — again, you need to wind up 
realizing that “software engineering” in any “engineering” sense is at best still an 
aspiration not a done deal. 


Computing is also a kind of “media” and “intermediary”, so you need to understand 
what these do for us and to us. Read Marshall McLuhan, Neil Postman, Innis, Havelock, 
etc. Mark Miller (comment below) just reminded me that I've recommended “Technics 
and Human Development,” Vol. 1 of Lewis Mumford’s “The Myth of the Machine” 
series, as a great predecessor of both the media environment ideas and of an 
important facet of anthropology. 


| don't know of a great anthropology book (maybe someone can suggest), but the 
understanding of human beings is the most important thing to accomplish in your 
education. In a comment below, Matt Gaboury recommended “Human Universals” (| 
think he means the book by Donald Brown.) This book certainly should be read and 
understood — it is not in the same class as books about a field, like “Molecular 
Biology of the Cell”. 


| like Ed Tufte’s books on “Envisioning Information": read all of them. 


Bertrand Russell's books are still very good just for thinking more deeply about “this 
and that” (“A History of Western Philosophy’ is still terrific). 


Multiple points of view are the only way to fight against human desires to believe and 
create religions, so my favorite current history book to read is: “Destiny Disrupted” 
by Tamim Ansary. He grew up in Afghanistan, moved to the US at age 16, and is able 
to write a clear illuminating history of the world from the time of Mohammed from 
the point of view of this world, and without special pleading. 


G Alan Kay 
1 Still trying to learn how to think better - 5y 


30 years ago, what were the first 5-10 steps in designing and creating a 
computer? 


30 years ago was 1988, and many components for many computers (especially 
personal computers) were VLSI, and could to some extent be TinkerToyed together. 


But what if you had a special architecture in mind that would require considerable 
design at the gates and bits level? To some extent you would need to trod the ground 
laid 20, 30, and 40 years earlier. However, the era of technology available would still 
affect quite a few of your early design decisions in almost every consideration: power 
requirements, connectivity/cables, various kinds of memory resources, etc. For 
example, the advent of the Intel 1103 1K bit memory chips in 1970/71 plus the 
available bipolar technology allowed a very small group at Xerox Parc to make a whole 
mainframe (which emulated a DEC PDP-10). Before this it would have been more 
difficult because of what magnetic core memories required, and to get microcode to 
run fast enough to bypass more discrete control memories). 


The Mead-Conway techniques — developed at Parc and CalTech in the 70s — allowed 
university students to suddenly be able to design and make and fab their own CPUs 
etc in the MOS technologies. Many of the early RISC chip designs came out of 
Berkeley — via Dave Patterson — using Mead-Conway. 


In parallel, some of the early steps are to try to really understand and focus on the 
benefits desired and expected from what will be quite a bit of work, and to not over 
design. For example, the Xerox Parc Alto personal computer was “lovingly 
parsimonious” in concept (by Chuck Thacker) and is worth studying to see how a really 
great designer who was also a comprehensive computer scientist went about the 
business of doing serious design *and* engineering at the same time. One of his 
processes was to try to put as much of the cost into as much memory as possible, 
partly by making an emulation architecture that could emulate most of the slower 
devices in software rather than having to add hardware for these. The money saved 
here was used to make the “inner CPU" faster, which would allow more things to be 
emulated, etc. He used an idea that was used on the old TX-2 super computer — 


many program counters, and zero latency switching between them — to interleave up 
to 16 microcode tasks. And so forth. 


Looking at things today, one can learn a lot by looking at how memory architectures 
can intermingle with "CPU" processing. There are enormous (even horrendous) 
bottlenecks to be found everywhere. 


One way to learn is to design and make some computers to try out various ideas and 
tradeoffs. You can buy inexpensive Field Programmable Gate Array development kits 
that will plug into your laptop that will allow you to even create supercomputers with 
dozens, even hundreds, of processing elements. The gates are all the same for 
everything so you can get distribute storage and processing functions everywhere 
(and even merge them). 


€ Alan Kay 
I'm the "Alan Kay" in question (try Google for the usual misinformation) - Updated 5y 


Why was personal computing an unusual research choice in the 1960s? 

| think Mark Miller's answer is pretty good. One distinction that might help is that 
between “personal computing” and “personal computer”. The drive for interactive 
time-sharing of the large mainframe computers that started right around 1960 was a 
drive for "being able to work on your own computer whenever you wanted day or 
night”. 


Doing this in a general way — where it seemed that you had the whole computer to 
yourself to do anything you wished — was preceded by multiple terminal sharing of 
specific applications (for example in the SAGE air defense system of the 50s where 
very large physical computers (football field size!) would have 30 or 40 graphics 
terminals with pointing devices to track planes via real-time radar data). 


fegere Suv, Posthaes ot @ Fppvel hoverion Oupley Coase 


Here is an example of a SAGE console from the 50s. And below is a diagram of a SAGE 
“blockhouse". 


TD a 
CA CATOR TRA 

rcComAee wn) 06 
eer OA ars Yr 
i ll 


7 ALA URS 


Cee. Peo 


OORT (ee aed 
net Eman alte 


Cte ed 


eQuruEeT 
oe FTE 
ATURE RA 


cemes Thee 
Tee eek 


The dual mainframes were on the second floor which was about the size of a football 
field. The consoles were on the 4th floor, and there was an early version of a display 
projection system on the third floor. The last one of these blockhouses was 
decommissioned ca 1982! 


John McCarthy's take on this setup and the console was “Every home will have one of 
these” — because the analogy was to utilities for electricity, water, gas etc. John could 
envision such a utility for “information” (and that this could be an early way to make 
the “memex” that Vanevar Bush had described in the Atlantic and Life magazine in the 
mid-40s). Today, this would be called a “cloud” idea. John was one of the forces to get 
MIT to make such a utility for its faculty and students, and this first opened up for 
interaction ca 1961. 


The experimental computer for SAGE was the TX-2 at Lincoln Labs, and it was much 
better set up hardware-wise for sharing than the commercial mainframes (MIT had to 
use a commercial IBM mainframe). But the whole machine was given to lvan 
Sutherland from 3am to 6am in 1962 so he could invent modern interactive computer 
graphics (he had to use half the capacity of this supercomputer to simulate a workable 
display system). 


The main designer of the TX-2 — Wes Clark — also created in 1962 one of the very 
first “personal computers” — the LINC — for use in biomedical labs (this was a really 
nicely appointed machine for its day). Consider that it had all of 2K 12-bit words as its 
main memory, all for one person to use for whatever they wanted! 


This — along with the earlier Bendix G-15 and the early minis such as the PDP-1 (also 
a Whirlwind and Lincoln Labs spinoff) — provided vehicles to think about how 
individual small computers would trade off with information utilities (which were very 
inefficient for the user-interface tasks, especially for graphical interaction). Early on, 
small computers were used both as standalone graphics machines and as graphical 
terminals. 


Doug Engelbart and the Augmentation Research Center (an ARPA project at SRI) gave 
what is now called the “mother of all demos" (it pretty much was!) 50 years ago this 
year. It used a time-sharing computer, but it was an archetype of how the research 
community thought about “personal computing” back then (I think of what he showed 
as “personal computing"). 


Bill English — co-leader of ARC — at an NLS terminal with black on white screen, 
mouse and keypad. There was a large debate within the ARC community about 
whether the next version of the system (in the early 70s) should be done again on a 
time-sharing system, or should be done on networked individual computers with a 
shared file facility. This eventually caused a schism, and a number of ARCers — 
including Bill and another central figure Jeff Rulifson — wound up moving to Parc 
(which was in the individual networked computers camp). 


Butler Lampson has pointed out that Parc in the 70s was pretty much the only place 
that was trying to render the "ARPA Personal Computing Experience” idea on 
networked machines each of which belonged to a single person (part of the network 
also included shared laser printers, next level file servers, and connections to the 
ARPAnet, and then Parc's own “internet”). And quite a few in the field did think this 
was a crazy research topic, especially those who believed that there were economies 
of scale in sharing mainframes (however, not if you valued user-experience). From the 
Parc standpoint, it was a very simple compelling idea in the world that Parc lived in, 
namely, the future 10-15 years out. We thought of what we were doing as simply 
building as much of that future early enough (a) for thousands of people to use and 
experience, and (b) to allow the much more difficult tasks of software and especially 
user interface design to have time to be invented. 


As I've mentioned before in Quora, the key to the success at Parc was to be able to 
control, invent, and make the entire designs of hardware and software with the 
resulting stuff just barely working — but working well enough. Less synergy would 
have doomed most of the goals. 


I'll leave the story here with these tradeoffs between individuals and utilities in mind 
— these still exist, and are still not done well enough. 


é Alan Kay 

“1 Still trying to learn how to think better - Sy 

Was anything meaningful lost for good when Xerox threw out their old 
PARC data disks? 


| don't think that anything technical was lost. But from the standpoint of actual 
historical artifacts: certainly. As far as we know, the Smalltalk-78 image that we 
brought back to life a few years ago is the only extant version, and it was almost lost. 


We didn't generally recycle our disk packs, but just got fresh ones, so quite a bit of 
historical work was on the stash of disk packs (I left all of mine behind when | left Parc, 
etc.) 


@ 


Who is working on developing the Dynabook today? 


Alan Kay 
| am the Alan Kay in question. - Updated 5y 


| don't know specifically if anyone is working on the larger goals of the Dynabook. 
(When | first wrote this, | was as trapped in the physical design and the past, as 
anyone. So | missed pointing out that Bret Victor and his DynamicLand group have 
been working on something larger than the Dynabook, and which is very much in the 
whole ARPA/Parc spirit of “learning to think, communicate, and act qualitatively better 
with the intellectual and learning amplifiers that are possible with interactive 
computing". Check out the DynamicLand website [’.) 


. a 


A cartoon | drew in 1968 after a visit with Seymour Papert. The children have 
programmed their own version of SpaceWar, and are playing it using their 
Dynabooks (which are wirelessly networked, etc.) 


There are certainly a number of people working on parts of the same visions and goals 
we had 50 years ago when the Dynabook idea appeared. | was asked to write a 
chapter about “What is a Dynabook [%" some years ago, and this will help the context 
here. 


It is generally thought of as perhaps the earliest notion of a laptop or tablet personal 
computer, but the center of the idea were kinds of services and contextual influences 
that are a bit more subtle (and there were actually three physical notions about the 
device (a) tablet (b) head-mounted display (via lvan Sutherland) (c) computers 
everywhere in the world (via Nicholas Negroponte). 


Some of the contextual influences that influenced me back then were variously from 
Maria Montessori, Doug Engelbart, Seymour Papert, Jerome Bruner, Marshall 
McLuhan, etc. Papert’s work with Logo and children got my thinking to change 
radically from “personal computing as tools for adults” to “personal computing being 
like reading and writing", thus children needed to start the journey into this new kind 
of “reading and writing” early in their life. 


The “reading and writing” idea and children forces the computer to be regarded as 
“media” not just “tools”, and both of these ideas have contextual aspects (meaning 
that it's not just what is written in a book that is important but what happens to a 
person when they get fluent in reading and writing and using this media to extend 
themselves). McLuhan, especially, had deep insights about this “We first shape tools 
and then they reshape us”, “The *medium* [itself] is the [big] message!" 


Montessori said (paraphrased) that “You can't teach the 20th century in a classroom" 
(meaning you have to embed the new epistemologies into the fabric of the culture 
around the children, and so a school has be a culture of these new ideas, not a lesson. 
Papert echoed these ideas by pointing out the difference between trying to learn 
French in a classroom vs living in France, and proposed that a “Mathland” that 
children could live in be manifested in a computer. 


Both Papert — and especially Jerome Bruner — asserted (correctly) that children 
needed “intellectually honest” versions of the new ideas (in science, math, etc), and 
that most adult versions of these subjects are not suitable for the developmental state 
of children’s minds — what needed to be done is to invent children’s versions of these 
that would be *real* math and *real* science, etc. 


And much more. In short, whenever you create technology, you are also creating a 
context that can deeply influence the course of thinking by the users (and this means 
that a technology is also a kind of curriculum). (These ideas were not at all thought 
about or known by the creators of Facebook, Twitter, etc, so they created disastrous 
“legal drugs” that most humans, especially children, have no defenses for.) 


Many of the ARPA researchers in the 60s (especially Engelbart's group) and Parc 
researchers in the 70s did understand the ideas about context and environment etc., 
and tried to use them to help what happens to users as they learn a new environment. 


For the Dynabook work, we wanted to invent and embed ideas that would help 
children learn to think better in the fabric of the user experience (and we considered 
both the existence of a reading and writing medium to include programming, 
including of the system itself). Those who are interested in this might like to see the 
tribute | did for Ted Nelson a few years ago that used the Xerox Parc Smalltalk from 
1978 (and which was the flavor of Smalltalk that Steve Jobs saw the next year). 


The entire system was programmed in the very same language that the children used. 
This opened up many possibilities for learning and understanding down the road. 


A number of good things were done subsequently — HyperCard, Boxer, StarLogo, 
Etoys (the ancestor of Scratch), etc. — but today — aside from Bret Victor's 
DynamicLand — |'m not aware of any current projects that try to be comprehensive in 
the large about “computing as a medium for powerful ideas” etc. (| don’t have positive 
feelings about the current fad of “everyone should learn to program” because it's 
really about "Why?" rather than “What?".) 


€ Alan Kay 
\ 1 Still trying to learn how to think better - Sy 


What's the best Donald Trump joke you have heard? 
Donald Trump! 


€ Alan Kay 

<1 Have designed a few programming languages - 5y 

Isn't getting rid of the evil "state" like Haskell's approach, something every 
programmer should follow? 

| wrote a Quora answer that touched on this question: Why is functional programming 
seen as the opposite of OOP rather than an addition to it? 


Part of John McCarthy's long ago insight (that after 60 years | wish computer people 
had grokked by now) is that you just have to model time along with everything else 
that is modeled. Then you wind up with parallel histories that both have the memory 
of things that we want and need and we also have the memories of when. This renders 
“mutable” and “immutable” moot ("mute?”) and besides the point. 


John's desire to have both unambiguous reasoning about situations and actions and 
causes and to have changes over time led to his powerful ways to deal with these. | 
think most of the confusions today stem from too limited a view on computing — 
very likely from taking programming practices as “computing” (whereas they are 
distressingly distant from each other at the present time). 


€ a Alan Kay 


I'm the "Alan Kay" in question (try Google for the usual misinformation) - 5y 


How did Alan Kay first get started with computing? 


| got started via carpet-bombing reading as a just turned teenager in the early 50s. I'm 
pretty sure that the catalyst wasn't science fiction because although scifi was into 
robots it had very little to say about computers in those days (Robert Heinlein did 
mention Howard Aiken at Harvard in one of his books). 


It was most likely something in Time magazine about “Giant Brains” that got me to go 
to the library and find a book with the title “Giant Brains, or Machines That Think” by 
Ed Berkeley ©, written just a few years earlier in 1949. This book was a popularization, 
but had a lot of detailed technical material in it (Berkeley had worked on Aiken's Mark 
| computer at Harvard, and was an expert in making logic mechanisms from relays). 


Included in this book was a description of a table-top relay machine kit that Berkeley 
had come up with — called “Simon” (you could buy the plans from him). There is a 
footnote that | didn't notice at the time about how an 11 year kid named Ivan 
Sutherland had helped build the first version of this machine and had written a 
subroutine to do double precision arithmetic! 


| didn't have the money to buy the plans or even start fooling around with relays (even 
though they were easy to get on Courtlandt Street in NYC at the time). | was into 
vacuum tube electronics and high voltages. 


A few years later, the next book | came across was “Faster Than Thought" by Britisher 
B.V. Bowden about computing in the UK, especially at Manchester. This described 
many different computers and how they were built. 


About that time | was at Brooklyn Technical High School and the nearby technical 
college (Brooklyn Polytech) had made a Simon kit and let people play with it. It 
couldn't do much, and the greatest pleasure must have been putting it together and 
getting it to work. 


A few years later (ca 1960) my friend Chris Jeffers was working for United Airlines in 
Denver wiring plug boards for the punched card machines in their IT organization. | 
took a crack at a few things and thought it was fun. 


| wound up getting drafted in 1961 and was able to get into the Air Force instead of 
the Army. One thing led to another, and while | was working in the path lab at a base 
hospital | heard about “a test no one could pass”. This was the programming aptitude 
test devised by IBM for the Air Force to screen for possible programmers. | liked to 
take tests, and for a lark took it and passed it. 


This got me to San Antonio, Randolph Air Force Base, and Air Training Command. The 
initiation to programming was the intensive one week wall to wall course IBM had 
devised. At the end of it we knew the 1401 computer backwards and forwards and 
could program it. 


We couldn't design much. Back then the idea was to start programming other 
people's designs so one was useful while gradually soaking up the more difficult 
systems design ideas and heuristics. This was done by implementing flowcharts, and in 
particular, flowcharts of punched card applications that the AF wanted to convert from 
the huge floors of rattling machines to run on small cheap computers like the 1401. 
The designers and makers of the flowcharts were called “programmers” and the 
human compilers of these into optimized machine codes were called “coders”. 


The 1401 at Randolph was all of 8K bytes (6 bits of information + 1 bit “word mark"), 6 
pretty fast tape drives, card reader, card punch, and pretty fast chain printer. Programs 
were on punched cards, most debugging was desk checking. Testing was to get the 
machine for about 3 minutes max in a day to run the program, have it not work and 
usually crash, get an operator to flip a few switches, look at a few memory locations, 
and get a core dump to peruse. 


My best friend there was Rachon Andre Douglas from New Orleans, a former Russian 
linguist, who was a terrific programmer. | learned a lot from him, especially about 
using the computer to help program and run itself. Even with the tiny 1401 we built a 
micro-OS that could fit into miniscule amount of the top memory to control the 
running of jobs. 


There was also a Burroughs 220 there — also a very idiosyncratic architecture — that | 
soon learned and started to program. 


But that's another story. 


€ Alan Kay 
Have designed a few programming languages - 5y 


How would you design the perfect programming language? 
" ‘Better’ and ‘Perfect’ are the enemies of ‘What Is Needed’ " 


And the latter is usually a qualitative step or more above “normal”. This means that a 
good heuristic is to try to “dim out" the present and “normal”. For example, we could 
try to forget about all the programming schemes and languages we know about. 


This gets us to a simpler more difficult goal: to invent qualitatively more powerful 
ways to get computers to do what we wish. 


To do this we have to try to understand what it means “to wish". Can we make wishes 
in forms that are understandable in ways unambiguous enough to make progress? 


To help our thinking processes, we can search for fruitful analogies and see if we can 
get them to yield some gold for us. 


For example, we could look at the strengths and weaknesses of constitutional law, and 
ask what it would take to get a computer program to understand the law, and 
especially the meta-law in constitutions. 


For another example, we could look at Biology, and especially at morphogenesis 
(developmental biology) to get better ideas of “converging processes” that create 
viable systems in the presence of many kinds of noise and random problems. 


We can look at successful societies to see how distributed 1Q, knowledge, and wisdom 
can work to do more than any isolated component. We can try to understand how 
negotiation and explanation help the whole system design and build better. 


We can devise sample “wishes” to see what good wishes might look like — to start we 
should come up with 20-30 of these and play with them until we desperately want to 
make them work. Then we can think about what mechanisms would be required that 
we will have to invent. 


And so forth, and lots more. 


In other words, when very high visions and goals are called for, one has to do a lot of 
meta-stuff to somehow both escape the current context and also somehow use what 
has been learned without it getting in the way. I've advocated “Learn everything, and 
then forget it except for the perfume!”. Meaning: when you forget you open up to 
more things — more “scents” — you can close your eyes and ears and find your way 
through your nose. When something possibly interesting appears, its odor will remind 
you of something you might have run across in the past, and this will help you think 
about the new idea. This works a lot better than trying to extend the past into the 
future (you just get an overcooked version of the past, and worse, the past you use 
will be all too familiar!). 


The simplest heuristic is to deal with “wonderful goodness” for as long as it takes 
before trying to figure out just what steps are needed to start realizing “what is 
needed”. 


Alan Kay 
<< Still trying to learn how to think better - 5y 


What piece of hardware in a computer matters the absolute most when 
trying to build a super fast computer? 


Many of the other answers addressed a somewhat different question than the one 
given. 


As lvan Sutherland (the father of computer graphics and VR/AR etc) recently pointed 
out at his 80th birthday celebration at SIGGRAPH, in the end the limiting factor and 
most pernicious bottleneck are the wires. For the last several decades Ivan has been 
working on “self-timed logic” (meaning that there are no external clocks (and wires 
from them needed) and in “comingling processing and memory” (meaning there is 
not a separate module called “CPU” and others called “memory"). 


All these help with getting rid of and limiting the size of wires. Also, this approach can 
help understand just what the “action components” should actually look like. 


€ Alan Kay 
h Have designed a few programming languages - 5y 


How do! learn programming if | hate complexity? 


If you really “hate complexity”, then perhaps you should take a shot at learning 
programming (really systems design) to see if you can help. Even better if what you 
really hate is “complication” (In several talks @, I've made the distinction between 
“complication” as “noise and mess introduced by humans” and “complexity” as 
something intrinsic to the phenomena. 


The basic idea is that if “complexity” pays its way by being interesting and important, 
it is worth understanding, whereas “complications” should be weeded out by finding 
better ways to think about and represent ideas. 


A good example outside of computing are Maxwell's Equations, which Maxwell 
originally wrote in non-vector form and as (many) explicit partial differential equations 
— these were reduced to the "fab four” by Oliver Heaviside via introducing vectors 
and higher level operators (i.e. curl and div). In this case you have to learn something 
additional but you are rewarded by greater simplicity in the new context. 


A parallel in computing is the move from many different idiosyncratic machine codes 
to designing “higher level languages” and then to design “very high level languages” 
(which could be statements of goals to be achieved by the system — so: 
“programming in terms of requirements", the “whats” rather than the “hows’). 


Much of real programming today should likely be in terms of meta-programming (like 
the above), but the “field” — such as it is — has preferred to try to scale up assemblies 
of weak building blocks directly. This has not been pretty or useful for the most part. 


More people who “hate complexity” (“complications”) are needed! 


€ Alan Kay 
‘1 Lam the Alan Kay in question. - 5y 


What are Alan Kay's Learning Labs? 


A “Learning Lab" was a once a year (sometimes twice) for many years get-together of 
the most interesting people we knew to share ideas about learning, teaching, 
epistemology, art, etc., held away from the regular world. 


They were usually three day gatherings done collaboratively with the Apple Hill Center 
For Chamber Music in New Hampshire where I'd spent a number of summers learning 
how to play chamber music (partly as prep for the pipe organ playing | had started to 
learn). The Center was on an old farm actually on “Apple Hill” outside of Keene out in 
the woods. The 1740 barn was the concert hall, there were lots of cabins and practice 
rooms. The motto of the place was "Great People, Great Music, Great Food, Live In A 
Shack”. When a second one was done in the same year, it was usually in California in a 
“place with vibes” (such as Ojai or Catalina). 


” 


Later we invited Mitchel Resnick’s “Lifelong Kindergarten” group at the MIT Media Lab 
to be a main part. Some of the guests and regulars included Tim Gallwey, Seymour 
Papert, Marvin Minsky, Betty Edwards, Doreen Nelson, Arthur Hull, Richard Dawkins, 
John Steinmetz, Eric Stumacher, etc. 


I'm pretty sure | got the idea for these from having been invited in 1975 — | think by 
Stewart Brand — to be part of “The Mind-Body Colloquium” which was held at the 
Zen Center Monastery in Marin County, CA. Stewart formed the idea for this partly 
from his experience creating and running the Whole Earth Catalog, and then from his 
interview with and the article he wrote on Gregory Bateson. | think they cooked it up. 
Stewart knew the abbot of the Zen Center — Roshi Baker — and it was a perfect place 
for this meeting. Some of the guests included: Bateson, anthropologist Mary Catherine 
Bateson (his daughter with Margaret Mead), Gordon Pask, Heinz von Forester, 
Humberto Maturana, Francisco Ayala, Terry Winograd, etc. 


Besides the discussions, we had a number of terrific learning experiences with the 
monks, the cooks (this was the ultimate in veggie cuisine (which later formed the 
famous San Francisco restaurant “Greens’). 


Later at Apple, we started a large educational project — “The Vivarium” — and | 
started to think about boosters for our thinking. | had already gotten Tim Gallwey 
(“Inner Game of Tennis”) one of the greatest teachers | had ever met together with Eric 
Stumacher and John Steinmetz as birds of a feather. And we cooked up the Learning 
Lab idea (and Apple had the funding). 


There were perhaps as many as 20 of these, so right now I'm at a loss to compactly 
describe how they functioned, but | will come back to this to add a few sketches that 
will give at least a gist of some of the activities. 


é Alan Kay 
<1 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 5y 


What did both PARC and the “tonerheads” at Xerox misunderstand about 
each other? 


Xerox had many different kinds of people; | don’t think of them as "tonerheads”. 


And while | was there — 1970 to 1981 — Xerox went through a number of ups and 
downs — from being “the fastest growing company in the US" in the 60s, to having to 
write off about 20% of their annual revenues a few years later when the computer 
company they had bought — SDS (renamed XDS) — failed. And then to experience 
enormous pressure on their core business a few years after that when they were 
forced to divest their patents. 


From CEOs who were visionary — Joe Wilson — to his protege Peter McColough 
(somewhat visionary) — to a very poor choice — “nothing but business” Archie 
McArdell — to Dave Kearns (who might have been visionary, but got distracted). 


Part of the story is that — when no one believed in plain paper copying and Chester 
Carson's xerographic process ideas, Joe Wilson and Haloid (later Xerox) bought up 
pretty much all of the patents and started adding their own. A good book to read is 
“My Years With Xerox: The Billions Nobody Wanted” by John Dessauer. 


Possibly pressured by IBM who had refused to help in the 50s but then wanted in on 

the success 15 years later — the FTC forced Xerox to make its patents public in 1975. 
The Japanese were quicker to act and in a stronger way than IBM and Kodak: in just 4 
years Xerox's market share went from 100% to 14%. This was both serious, and also a 
huge distraction. 


Some of the people within Xerox — like physicist Jack Goldman who was the Chief 
Scientist — understood the implications of Moore's Law and some of the implications 
of computing. This led him to set up Parc in the first place to have labs for what he 
thought would be the key leverage points for Xerox's future: Physics, Optics, 
Computing, and Systems. Others in the company were not “digital thinkers”, and 
others were primarily business types who just wanted to make money safely, and who 
had moved in as the company went from a super-risk super-technical company in the 
50s and early 60s to a cash cow. 


It wasn't that the Parc scientists were hopelessly naive, but because of both the 
optimism and success and the nature of process that are part and parcel of science, it 
requires an enormous amount of effort on the part of a scientist or engineer to 
imagine that the majority of human beings don't have similar contexts, and even more 
so that they won't do something just because it is a good idea. Normal human 
decision making is much more like a mass mutual agreement on something that may 
have no particular rational or scientific basic — i.e. and e.g. “fads”. 


Part of the idea of building lots of Altos, Ethernets, PUP nodes, laser printers, etc., was 
to make a “living lab” to vet the ideas, and other part was to show Xerox something 
that was much more than a demo of a technology. We thought that this would be 
extremely convincing. In fact — for a variety of interesting reasons — it wasn't at all. 
The big Boca Raton “exhibition” of the Parc technologies for the Xerox executives fell 
flat. In fact, their wives could see what it was much better than the execs. One of many 
interesting observations was that the execs couldn't imagine doing any work 


themselves on an Alto. They could see some possibilities for secretaries, but missed 
what might be called the “Engelbart Context” about collaborative work and planning, 
and the “Lick and Taylor” context of “Computers As A Communications Device" (a 
1968 paper they wrote). 


The big misunderstandings were almost entirely about what human 
misunderstandings are usually about: human beings have a “private universe” in their 
heads, and communication requires some agreement on some of the contexts and 
world views of the private universes. 


The big tragedy was not that Xerox missed what Parc did, but that the government 
funding that led to the large concepts of ARPA/Parc was discontinued in the early 70s 
by Congress (again from a lack of understanding). Parc was able to do the next stage 
of the ARPA ideas, but no commercial company could handle the ideas: not Xerox, not 
Apple, not Microsoft. That continues to be a tragedy of the first magnitude. 


€ Alan Kay 
<< Still trying to learn how to think better - 5y 


What was it like to be at Xerox PARC when Bill Gates visited? 
Did he visit? 


To my knowledge, he didn't while | was there (1970-1981). 


Austin wrote me to say that he had gotten the idea from this Gates quote: A Rich 
Neighbor Named Xerox &% — "Well, Steve, | think there's more than one way of 
looking at it. | think it's more like we both had this rich neighbor named Xerox and | 
broke into his house to steal the TV set and found out that you had already stolen it." 
Bill neglected to mention that he was a few years late to participating in his analogy to 
a burglary. 


However, most of us who had been at Parc did not think of it as a burglary, and 
wanted to share the ideas, but the whole ideas. | once said to Bill “Bill, we want you 
take the ideas, but please take the whole idea, don’t make a grotesque caricature of a 
misunderstood subset”. He did not look happy about this. Steve and Apple had much 
the same problem with subsetting. 


Part of the difficulty was that neither company was willing to deal with the whole 
problem of personal computing and networking, which requires both hardware and 
software to be designed and built. Parc had been able to do a lot 10 whole years 
earlier because it went after what it considered the main problems and created 
custom hardware and software working together to make systems that would do the 
job. In the 80s the Intel and Motorola CPUs were not sufficient to handle the 
computing that needed to be done, but both companies backed off and only did 
things that these chips could do (often with enormous amounts of code — for 
example, the screen painting primitive at Parc was about 300 instructions of Alto 
microcode, but became almost 1000 separate subroutines for the Mac's QuickDraw in 
68000 machine code). 


It took many years (decades) before these companies — always with immense 
resources compared to Parc — were willing to dip their toes into whole systems 
design. 


It's not clear that either company actually ever understood personal computing and 
networking at the corporate level. 


Alan Kay 
| am the Alan Kay in question. - Updated 5y 


How did Alan Kay read 20k books? 


What | actually said was “I'm pretty sure | haven't read more than 20,000 books”. 


| didn't count them. But | started reading quite early (~ 3 or so) and got fluent pretty 
quickly, and would be termed “voracious” from age 5 on. I'm 78, so if | read a book a 
day for 73 years, that would be over 26,000 books. I’m sure | haven't read that many. 


My guess is around 16,000 or so in the 73 years I've been reading “at scale”. By the 
third grade | had been reading quickly and with understanding, but | realized that I'd 
have to read books again if | didn’t remember them better over the long term. 


Trying to explain “remembering well” is more difficult than “reading fluently” — both 
are most definitely skills (like the skill parts of sports or music) — so | will just look at 
the latter here. 


Basically, skill learning is getting your “System 1"* to pay attention to the things you 
want to get fluent at. This system is almost invisible to our conscious mind and is hard 
to boss around (it actually bosses us around a lot more than is good for us). Tim 
Gallwey used to say “The parts of your mind you need to have learn don’t understand 
English!”. 


Repetition with good focus helps (I also think getting started really early on many 
“System 1” learnings really helps). 


One thing | found out was that trying to understand (i.e. “System 2") while reading was 
not always a good strategy — on the other hand, being relaxed and “somewhat 
cognizant” would have big benefits hours later and especially the next day. | learned 
to sight-read music later in life, and got to look at the process more closely than | did 
at reading words when | was 5 and 6. Basically, you are trying to “grok” what's 
important in the real-time phase of it, and then to let your memories and internal 
machinery chew on this for hours or overnight. Both kinds of reading are primarily 
about a kind of short-term memory for the real-time and then “allowing” the short- 
term memories to be retained. 


From the standpoint of today’s still limited knowledge about how the mind works, it 

seems that we remember a lot more than we can retrieve — so learning how to recall 
what you've been through once is one of the keys. This “recall” will often be more or 

less invisible, but it can be fostered. 


Well, | have strayed into “remembering” after all. One of the things that helped me a 
lot was one of the fallouts from doing lots of reading — | started to have many more 
things to make analogies with, and the analogies made remembering everything 
much easier. A side benefit is that analogies are also one of the best ways to have 
ideas (as Koestler's “Act Of Creation” @ explains). 


So | think the answer to the question is “less than 20,000", “very much like other high 
level skill learning”, and “curiosity as a calling”. 


Adding a few more thoughts: the cliches are quite true about “the thresholds where 
you realize just how ignorant and limited you are via education, and especially from 
wide scale reading”. These can be quite daunting. The US Library of Congress lists 
“More than 38 million books and other printed materials, 3.6 million recordings, 14 
million photographs, 5.5 million maps, 8.1 million pieces of sheet music and 70 million 
manuscripts, 5,711 incunabula @, and 122,810,430 items in the nonclassified (special) 
collections: 

more than 167,000,000 total items[1] @". 


So 16,000/38,000,000 = 0.0004 = 0.04% ! One hopes (perversely) that not all the 38 
million books are worth reading! 


More importantly, it is likely that there are more worthwhile books about than a 
human could possibly read in a lifetime. This means there are very likely worthwhile 
points of view — even critical points of view — that one hasn't encountered yet. This 
is a nice parallel — emotional as well — to the delights and dilemmas of doing and 
thinking science: you can’t wind up with the kind of blind certainty and belief that our 
genetically built brains crave and our cultures pander to. But by embracing a kind of 
uncertainty and judgement one can see many things more clearly. 


It is also a parallel to why cooperation is a better strategy and tactic than competition 
in science, engineering, politics, and in life. And why arguing should be about 
advancing enlightenment instead of trying to win. 


| also forgot to remark about the usefulness of “not so great books”. These provide an 
important context for “what people want to communicate”, and great help the 
appreciation of the great books. Fluency in reading helps a lot, so it doesn't cost much 
in time to read a “not so great book". 


* see Kahneman’s book “Thinking: Fast and Slow” 


€ Alan Kay 
<1 Lam the Alan Kay in question. - Updated 5y 


What was Alan Kay's experience like working at Apple? 

There were many Apples in the period | was there (from about May 1984 to about 
Sept 1996). This was not just the turnover in Chairmen and CEOs (from Mike Markula, 
to Steve Jobs to John Sculley to Mike Spindler to Gil Amelio, etc.) but also having the 
BoD throw out Steve in 1985 made a big difference. 


My opinion of the BoD was very low. They consistently passed up deep opportunities 
to grasp and make the future of personal computing and networking. For brevity, | will 
leave these out here. 


Also, very especially in Apple, just who was COO really made a difference (main job is 
to make things happen that are supposed to happen and vice versa). 


And who was head of product and marketing (e.g. the personality of Jean-Louis 
Gassee had quite an influence both positive and negative). 


And Apple Software when it was directly part of Apple and then spun to the side as 
Claris. 


Some of the high points revolved around Steve for the year we overlapped, and then 


John Sculley. The latter was willing to fund for multiple years a big education project 
I'd thought up (the “Vivarium"), and was the big backer of Hypercard when Apple 
Marketing didn't want to have anything to do with it. 


When John was forced out, there was no one left to really back “what's next?”. For 
example, Hypercard was (a) a great start (b) needed a brand new version (c) needed to 
be made the basis of an Apple WWW browser (think about how wonderful this would 
have been!) The old Parc hands (who had also worked to help invent the Internet) put 
a lot of effort into trying to convince management to have Apple software be the 
main source of high quality wide spectrum authoring for the web — but to no avail. 


The Newton was a good idea to do, but Apple Marketing forbade important features, 
fearing that it would then hurt the Mac. 


Etc. Etc. 


So there was quite a downward spiral happening after Steve and Sculley. | and my 
group left in Sept 1996 as Gil Amelio came on board (as far as Gary Starkweather and | 
could tell, he was both fearful (literally) and didn’t have a clue about what Apple was 
and could be about. 


An important point — | think — is that when Steve came back he was able to start a 
new direction for the company — kind of “consumer conveniences” — but he was no 
longer interested in the ideals of personal computing and “Wheels for the Mind". This 
was a shame (but people who see no higher than money loved it). 


Despite all this, he and | remained friends. 


€A Alan Kay 
“7 Still trying to learn how to think better - Updated 5y 
How can | practice to play a classical guitar tremolo faster? 


Lots of good advice in some of the other answers. However, perhaps we can ask a 
larger question: how can | practice to play a *beautiful* classical guitar tremolo? 


In other words: Technique should be the Servant of Art, not the Master. 


The quintessential tremolo piece is Tarrega’s “Recuerdos de la Alhambra” (Memories 
of the Alhambra). It is written as though for a solo instrument and accompanying 
instruments, and rather “orchestral” in nature. 


The Alhambra is an otherworldly breathtaking “castle” in Spain built at various times 
by Muslim, Christian, and Jewish artists, and is now partially in ruins. 


We don't have to have an explicit “program” to latch onto the wistful moods of the A 
section in minor — could be “remembrances in moonlight"? — and the B section in 
major — “in the sun"?, or “now | really remember this wonderful place vividly"? The 
second A section — “Oh, it’s just a memory"? 


If the solo voice were a violin or a flute or oboe, etc., we could imagine how a great 
musician might play it: there would be a fluid pulse, many changes in dynamics, and 
some in timbre. Here is what the score of the A section would look like to a violinist or 
flautist (leaving out a few little ornaments and showing it as it sounds): 


Alhambra 


the reesical sevcture of the A soction — 


we 


Gis; a De ag es ee pt 


Se 5 Fe SP SS Ss 5 Ss Ss FSS 


The central idea here is that there is nothing about this piece that concerns the call for 
tremolo as “lots of notes” on guitar: Tarrega wants the player to create an instrument 
that can express the solo voice of this piece. Since the guitar doesn’t sustain well, and 
a note cannot be crescendoed, the tremolo will allow a “note” to be sustained and 
have variable dynamics — there is a kind of ebb and flow of dynamics indicated in the 
score. 


This is very much in keeping with what classical romantic period guitar is all about: to 
make a wide range of expressive gestures that evoke cellos, brass, percussion, flutes, 
and, here, something perhaps like a violin or an alto flute. 


Or if you wish: a new instrument that we can call “angel's wings”. 


Today we have the wonderful resources of YouTube, which are invaluable to any 
student of any music. We can try to find examples of “tremolo as instrument” rather 
than "tremolo as ‘speed’ " or “tremolo as ‘notes’ ”. 


My favorite example of a guitarist transcending the instrument to achieve this is Enno 
Voorhorst's version (which has been viewed by almost 350,000 people). 


Please use good headphones to really hear what is going on (and also ignore the 
“atmospherics” of the green screen background “in the snow’). It’s also worthwhile to 
go to full screen to get a sense of just how relaxed he is playing this piece. 


Having a good sonic image of "the music to express” really helps when working much 
more slowly on the techniques needed. One suggestion here is to not worry about the 
speed of the the tremolo (or piece — they are tied together) and to instead 
concentrate on the balance between solo and accompaniment and then include the 
changes in dynamics — the louder and softer of the solo line — while keeping the 
background at roughly the same dynamic. 


You will start to notice that Enno is not playing this at a very fast tempo — the tempo 
is just right — and the key to the sound is just how evenly and lightly he is playing the 
32nd notes — there is nothing to remind you that they are individual sounds — the 
actual notes in the solo are made from all the 32nd flutters of the same pitch — and 
they quickly make a whole timbre of the “angel's wings” instrument which maintains 
perfect line throughout the piece. You can see that his right hand is hardly moving and 
that he is playing the tremolo on top of the strings rather than digging into them — 
yet he’s able to completely control the dynamics of this line (magic and art!). The 
illusion here is wonderful, and fits in with the “remembrances as illusions” feeling. 


This is a key to how to practice. It will take some months regardless — here you can 
concentrate on expression and evenness and lightness and dynamics and music and 
just keep everything slow, and get great enjoyment while learning. 


The first few bars of this piece are very good for learning tremolo playing (one of 
many hints is to learn tremolo on an inner string — Tarrega plays the first E on the B 
string — this will help keep the right hand fingers from flying open — and playing 
slowly will help even more). 


It's not a competition, sport, war, body-building, or a novelty act: it’s music, it’s art. 


€ Alan Kay 


Have designed a few programming languages - 5y 


Is there any programmer with 50 years of experience? 


I'll nominate several of my heroes: 


Maurice Wilkes, was the architect and builder and programmer of one of the very 
first stored program computers — EDSAC ca 1948-9 (which in fact is now being re- 
created at the UK National Computer Museum in Bletchley Park). 


Wilkes lived a long productive life, and was very active right until he passed away in 
2010. He was the second Turing Award winner, and received it for hardware, midware 
(he was the inventor of microcode), and software. 


In his reflections, he recalls starting to walk up the stairs in Cambridge, and was 
stopped by the thought that "I was going to have to spend most of the rest of my life 
fixing bugs in programs". 


| think we can safely think of Wilkes as having spent 60 years as a programmer and 
designer. 


Ivan Sutherland just celebrated his 80th birthday this year, and was feted at this 
year's SIGGRAPH. Both Ivan and his brother Bert were very involved in engineering as 
young children (their father was a civil engineer), and they both met Claude Shannon 
early on. lvan at age 12 or so worked with Ed Berkeley on SIMON, a relay computer kit 
that Berkeley had come up with. In his 1950 book “Giant Brains”, Berkeley credits Ivan 
with writing a program for SIMON to do double precision arithmetic. 


That was 68 years ago. 


And, yes, | started programming as a real job ca 1961-2 in the Air Force, so I've been 
at it for about 56 years now (I don’t count the paper stuff | did before, because until 
you get to debug, especially on a machine, “programming’ is really a kind of 
“planning” or “intending”. 


Most of the programming I've done for the last many decades is "“meta-programming” 
* — which is still programming, but a bit different from what most people think of 
when they hear the term. 


* | was just asked about “meta-programming” but the question disappeared. Here is 
my reply: Well, there is the Wikipedia definition (the examples are a bit better than the 
definition) — but even though Wikipedia has its faults, | always look there first. So 
take a look at that. Metaprogramming - Wikipedia @ 


| would change the emphasis in that article to be more on design than writing code 
(this is a good heuristic in general). Also the article does not mention “extensible 
languages" as a key example of meta-programming and meta-design (the treatment 
of DSLs is weak etc) 


You can also get some clues from “meta”. In any kind of programming, one represents 
ideas and manipulates the representations. In “meta-programming” the 
representations are of programs and the stuff of programs and processes. 


€ Alan Kay 


| am the Alan Kay in question. - 5y 


What does Alan Kay think of structural editors? 


| haven't tried a structural editor for so many decades that | don’t think | can comment 
on what is around today. 


Also, | think that liking these or not has partly to do with temperament and style. For 
example, | turn off the grammar checking in my word processors when I'm trying to 
write English, because | want to use all the brain cells | have for “flow” and “content” 
intermingled. Similarly, for programming, | want to use my limited mental facilities for 
design and bottom-up experiments. 


We did a very early version of “tile-based” (“block-based) scripting in Etoys (later used 
for Scratch) and this retained some of the bottom-up feeling while making a method. 


Another related topic is “mathematical expression builders”. 


On the other hand, | think Google's attempts to anticipate what you are trying to 
search for are generally beneficial. 


Here is the rub (I think). A big question for all systems that know about structure and 
might have ways to more easily build the structure is what does it take to change your 
mind and make something different in part or all of it? I've advocated treating what's 
there as “flat” and being able to use a stylus to isolate and scrub the parts to be 
changed, etc., so there is not such a feeling of great burden. | haven't seen anything 
like this (but there could be — it will be interesting to hear from readers just what is 
around now). 


There are other related comments here. For example, when color helps and when it 
gets in the way (similarly, other structural delineators). In all systems — whether tiles 
or not — one would like to “show structure” for some purposes, but also be able to 
“show flat” for others (for example gisting). 


It's likely that the answer — still — for today has to do with programming needing 
many points of view quickly and easily supplied by its Ul — and I'd like to hear of any 
systems that really do a good job here. 


Alan Kay 
Have designed a few programming languages - 5y 


@ 


What are some ways that programming was better in the past? 


This is the 19th answer — from someone who started programming in the early 60s, 
and had a sense of how things were in the 50s. 


| expected to see in the other answers at least one that pointed out that there was 
always a range of “computer aided” in how programming was done, and that this 
range very much depended on particular cultures of computing, often that coalesced 
around particular physical computers. 


To take the 50s for example, during a time when most people were "factory hands” 
with one or two batch times a day, with debugging being looking at octal core dumps, 
there were some who made interactive debuggers that vastly changed how 
development was done (check out the traditions of Whirlwind into DEC, etc.). 


Similarly, there were others who started thinking about having the computer do as 
much work as possible with the programming task: they started to invent and build 
higher level assemblers and algebraic languages. 


Some went cosmic (LISP was invented in 1958!) 


And others didn't want a factory production line model, so started to find ways to 
share “time-quanta” of the computer so that 10s and sometimes 100s could 
interactively program and debug. 


Meanwhile, most programmers were doing no more than coping with what was there 
(and doing a lot of work because of this). 


Almost 60 years later, | see the very same picture. Most of the programming I'm aware 
of is done with little more technology than slightly enhanced punch cards and "system 
builds” as isolated events. It is disheartening to hear what most organizations think of 
as “agile” programming! 


Yikes! 


And still there are a few people — as there are in every generation — who are actually 
trying to do more than cope. And they occasionally advance the level of art a bit, while 
most of the others ignore the possibilites. 


€ Alan Kay 
R Still trying to learn how to think better - 5y 


Did Bell Labs and ARPA/PARC ever collaborate? 


In the 60s there weren't a lot of computer researchers, and they were generally 
friendly. So more or less everyone knew everyone. If you look at the history of Unix, 
you will see cross fertilizations both from the MIT Multics research and from Butler 
Lampson and Project Genie at Berkeley. 


BCPL by Martin Richards from the UK was very influential, and spawned 
implementations and adaptations at MIT, Lincoln Labs, etc., and Bell Labs (where it 
became “B” and then morphed into “C"). A similar process happed at CMU, where the 
morphed language was called Bliss (Bill's Language for Implementing Systems 
Software — Bill Wulf). 


One of the most influential early language systems in the 60s from Bell Labs was 
SNOBOL by Dave Farber, Ralph Griswold and Ivan Polonsky. 


“Extensible Languages” were a big topic in the 60s, and Bell Labs contributed a few. 


Doug Mcllroy was a fountain of ideas there, and contributed an extensible macro 
system, the Unix “pipes” ideas, and many more. 


Perhaps the biggest difference between the two organizations in the 60s is that ARPA 
had a really cosmic vision about computing and networking — involving many topics, 
including interactive graphics and simulations, and world-wide networks and 
collaboration — and Bell Labs didn’t. | think ARPA was greatly helped to have about 
3/4 of the projects in universities. 


In the 70s, | think Unix, C, the DEC PDP-11, and their tools created a center of gravity 
that influenced most of the thinking at Bell Labs. Parc was bent on making a complete 
first pass on the cosmic ARPA vision, and this required it to make its own computers 
and displays and networks rather than buying from vendors, and to do all other parts 
of the hardware/software as needed — and to collaborate strongly with the ARPA 
researchers involved with “internetworking”. 


€A Alan Kay 
s Still trying to learn how to think better - 5y 


How did Xerox PARC do engineering requirements management? 


| was one of the computing researchers at Parc (two of the four labs were devoted to 
this: Computer Science Lab and Systems Science Lab — the other two were devoted 


to optics and physics) from a few months after it started in July 1970 to 1981. 


| never heard the phrase “engineering requirements management", and I'll confess I’m 
not sure exactly what it means (it is perhaps guessable). 


Since Bob Taylor did not want to have “management”, this could even be a moot 
question. 


However, we did have a number of really great engineers — like Chuck Thacker, who 
received the ACM Systems Software and Turing Awards for his work. Or Dan Ingalls in 
software, who received the Grace Hopper and the Systems Software Awards for his. 


We also had a number of people who might be better considered “scientists” 
(especially from temperament), but who could put on an “engineering hat” when 
needed. This “hat” combined with “deeper realer engineers” often produced excellent 
results (my favorite is Butler Lampson designing the very complex micro-coded CPU 
for the MAXC mainframe that we built in the first year of Parc). My view was “Well, 
Butler *is* amazing, but Holy Shit!”. 


One way to think about Engineering is that it combines “getting something to work" 
with “in a principled careful enduring way”. This requires toughness along a number of 
dimensions, including with *process*. 


So: things are being “carefully managed" even in a larger system that has eschewed as 
much of “formal management" as possible. 


In practice, the general aim of the computerists at Parc was to realize “the ARPA 
Dream” of pervasive personal computing internetworked worldwide with artifacts that 
could be and were replicated at the level of 100s to 1000s. This required a lot more 
engineering than just making a demo of a great idea. 


€ Alan Kay 
21 Still trying to learn how to think better - 5y 


Why do people see the fretlessness of the violin family as an obstacle? They 
miss notes just as much in fretted instruments. 


An additional wrinkle not covered in the earlier answers is the question of playing 
polyphony with “as many voices as the piece demands". 


There are somewhat substantiated theories that the flatter fingerboard and bridge of 
the older bowed strings — before the violin family was converted ca 1800 to being 
able to put out a lot more volume — plus perhaps variable tension control in the 
“German-style” bows with slacker hair that could be tightened and loosened with the 
right hand to get more or fewer strings — allowed at least three voices — maybe four 
— to be played together without arpeggiating. 


For this kind of music, the fretted viols help getting the pitches of the multiple voices 
“good enough”, whereas the unfretted violin family presents really great difficulties 
when going beyond double stops or using open strings to help. 


| recently took up the lute, and found that the moveable frets really help to get the 
important major 3rds and 10ths in a particular piece in excellent tune. | wish | had 
them on guitar (baroque guitars have them but with with less than 6 strings). | also 
have a vihuela with 6 courses of strings and gut frets. Changing the tuning of the third 
course will provide the guitar string relationships, so one can experiment with “better 
tunings for (some) guitar music”. This is quite interesting! 


One of the main reasons given for the fixed metal frets for guitars from the 19th 
century onward — besides fixing on equal temperament — is much longer sustain. 
This is certainly the case. 


But, as all guitar players know, if you are sensitive to tuning, the equal tempered 3rds 
(and 10ths) are between annoying to painful). (There have been experiments over the 
years with “microtonal” guitars with moveable metal frets...) 


And equal temperament is not as offensive with the more mellow sound one gets 
from “no nails", something I've just recently been able to hear comparisons of good 
players with and without. 


“No nails” is the mainstream for lute playing (with perhaps some nails used for loud 
continuo playing on theorbos). I've found that it creates a much wider richer palette of 
color than using nails. “No nails” on 19th century guitars was also more normal, and 
this also brings an amazing richness of expression. 


Here's a YouTube video of the wonderful Anna Kowalska playing a replica of an ca. 
1820 French “parlor” guitar (after Lete). She has tuned it to 415, and is using Savarez 
low tension strings (which she says are “the right balance between good and evil" 
compared to gut). See what you think! 


ra) 


é Alan Kay 
“1 Still trying to learn how to think better - 5y 


What is the first IDE you remember using? 


| think that the first Ul | used that could carry out interactive development and 
intermingled live code and live debugging was the CAL system for the Project Genie 
Time Sharing System (which became the SDS-940, and was used by Engelbart for the 
“big demo"). CAL was ca 1965-66, and primarily made by Butler Lampson while at 
Berkeley. 


CAL was an interesting system itself, in that it was an attempt to make a very JOSS-like 
end-user system, but (a) within the limitations of Model 33 teletypes, and (b) to have it 
be able to do instant real incremental compiling. The result was not as pretty as JOSS, 
but was as easy to use, and the incremental compilation allowed some serious 
programs to be developed on this very early small but consequential machine. 


JOSS was the first great end-user programming system. It predated — and was better 
than — Dartmouth Basic (which wound up using a few JOSS features). It was primarily 
the work of one remarkable person — Cliff Shaw — who also wrote the first great 
paper on user interface design @. This was a HW/SW design (where great care was 
taken to design and build every part of the system, including the interactive 
typewriters used as terminals). 


JOSS was followed by two important systems. JOSS II was a general facility for RAND, 
and again with specially designed terminals — a truly beautiful system: one felt a 
pleasant quiver just from the knowledge that one was going to be able to use it that 
day. The second landmark system at RAND was GRAIL &@ (GRaphical Input Language), 
which required the first great tablet-and-stylus to be invented (in 1964 — the year of 
the mouse) and the first great gesture recognizer to be invented and made. 


€A Alan Kay 


Have designed a few programming languages - 5y 


Do | lose something when learning OOP in Ruby instead of C#/Java? 

Here’s what | think is a good analogy, but may not have universal reach. After a layoff* 
of about 4 years from guitar | recently started to learn lute: also with strings and frets 
and played with most of the fingers on both hands. But: the lute has mostly double 
strings, it is tuned differently, the notations are different, etc., the hand positions are 
different (the right hand very different), the frets are made of gut and very low, and 
there are many other idiosyncrasies that differ from the also many oddities of the 
guitar. 


This continues to be an interesting process. The thing that transferred the most and is 
helping the most is “what music is all about”: how melodies, harmonies, counterpoint, 
rhythm, phrasing, “line”, etc. work, being able to hear internally what the music is, and 
so forth. Then the work and fun becomes how to get the music through the 
instrument to add its “special stuff” to what finally leaps into the air. New skills have to 
be learned, etc. Some of them will take months and years. 


Similarly, my take on programming something is in terms of “its music”, and for me 
this is always in the form of and in terms of a dynamic systems design, with different 
parts of the system interacting with each other, and where finding/identifying the 
parts relationships and interactions is the critical part of the design. In the end it will 
be the systems design that will either help or hinder the actual programming, the 
debugging, and especially the down the road maintenance and inevitable changes. It 
is worth trying to do a good job here! 


Then the question becomes: what does it take to make that system with the tools at 
hand? This usually will require some new tools to be made from the resources 
available. We know we can do this, because the resources we use are Turing complete 
enough to allow us to make just about anything we need and want. 


This gets us to the question: how much work is this going to be? And different 


programming systems will have different loads depending on the problems. Some of 
the loads might be expressive, some might require high efficiency. 


Because we can't discount the design requirements and possible learning curves we 
might need to get on, we could even wind up working out the design with models 
made in one language, and then switching to another language to carry out some of 
the pragmatic needs. An early 60s stellar example was the Randall and Russell 
Algol-60 system on the KDF-9 computer — they made two semantically equivalent 
systems: one a dynamic interpreter for interactive debugging, and the other a 
grinding optimizing compiler. The Brooklyn Union Gas company system of the late 80s 
was designed and prototyped on workstations using a dynamic OOP language, and 
the semantics of this were rendered — via PL/1 ! — on their IBM 3090 mainframes for 
the day to day deployment. 


So: my advice is try to avoid “learning OOP" via a language. “Real OOP” is not about 
making abstract data types, setters, or most of the things that are typically done in 
them. “Real OOP" is about dynamic systems, and the way to learn it is to start 
designing and building them and gradually build up a feeling for scaling, 
modularization, intercommunications, etc. Then you will be able to choose how to use 
the (too many) degrees of freedom available in almost all programming languages. 


*| had to give up guitar playing about 4 years ago because the bracing stress on my 
left thumb had broken down the base joint beyond repair. The lute has the possibility 
of not having to use thumb bracing very much (and this is working so far). 


€ Alan Kay 

7 Have designed a few programming languages : 5y 

Why does Smalltalk not become a commonly used platform for learning 
presentations and video lecture constructors? Squeak looks great as an 
interactive presentation platform for college lecturers. 


Commenting on Eliot Miranda's answer. 
Hi Eliot 


Squeak Smalltalk was largely made so we could do Etoys for children: an ancestor and 
superset of the later Scratch. 


From the child's point of view | wanted it to be an “object media environment” with 
drag-and-drop scripting and massively parallel execution. The design drew from the 
underlying Smalltalk, Logo, StarLogo, and a powerful superset of HyperCard, in which 
Smalltalk’s multiple “desktops” acted as the “cards” and could be used as “slides”. This 
allowed a desktop/card/slide to exhibit anything, including work in progress, dynamic 
content of all kinds including simulations, animations, videos, sound and sound 
synthesis, 2D and 3D dynamics, collaboration between other Etoys systems on the 
Internet, etc. 


This worked both as a general authoring environment, and as a superset of a 
presentation system such as PowerPoint because we could use the children’s scripting 
to easily control all the dynamics we desired, including sequencing within each "slide", 
going to next slide, effects, etc. 


A white paper about this system is here @. 
Here is a white paper about children’s use of this system to learn science @. 


This was done 20 years ago in the late 90s and was perhaps at its prime in the early 
2000s. | gave many talks using this system, and it was definitely the most convenient 
of all the ones I've used. 


The graphics — basically Morphic — in Squeak Smalltalk was of PowerPoint quality in 
those days, but — for lack of funds — was not improved to keep pace with the 
rendering engines used by MS et al. At some point PowerPoint started to look better 
while still doing a lot less. 


In the STEPS project ca 2008 Dan Amelang did the Nile DSL which was able to get 

state of the art quality rendering in an amazing 500 lines of “executable math” that 
was also fast enough to be useful for most presentation purposes. | did many talks 
with the STEPS system also, but these did not involve Etoys. 


I've been amazed that no Squeak Smalltalk people have taken the Nile system and 
replaced the old Smalltalk rendering with it. 


Dan Ingalls et al took a pass putting the Morphic model over JS in the browser (in the 
“Lively Kernel” system). This has very high quality graphics, but they didn't carry 
through with a real end-user framework (like Hypercard or Etoys) that would allow it 
to be easily used in the manner of Squeak Smalltalk). 


So, unless I'm mistaken, | don’t think that today there is an end-user authoring system 
with state of the art graphics and media and with the expressive power of the 
Squeak/Etoys Smalltalk system of ca 15 years ago. 


Note: today | think the way to do this would be to use JS as an assembly language and 
build an entire system over the top of it, and especially to use the embedding features 
available to allow external media to be presented inside the system. For fun Dan 
Ingalls, Bert Freudenberg et al did this with the Smalltalk from 1978 — the version that 
was shown to Steve Jobs the next year in 1979. | used it in the same way as recounted 
above to do the presentation for a tribute for Ted Nelson %. 


é Alan Kay 
1 Still trying to learn how to think better - 5y 
Why don't more classical musicians improvise? 


Lots of very different answers here, partly depending on where in the last 600 or so 
years one looks. Here I'll just mention some of the parts of early music that require 
improvisation, especially continuo (playing “accompaniment” on a chordal instrument 
such as a lute or harpsichord), and especially 17th century continuo. 


The most often found form to work from is a bass line with a notation for harmony 
but no further notes. If you are lucky to be the only other player besides the soloist (or 
another bass instrument doubling the bass line), you can also make up the harmony 
as well as the notes to express the harmony. Depending on the skill of the performer 
and the style of the piece, the “realization” can be anything from “just chords and 
bass” to complete contrapuntal parts with new themes. 


Also, a lot of early music was written as “outlines” which could be played straight, but 
which most performers would not just ornament, but play “doubles” which would be 
further whole choruses with parts or all improvised. Examples abound here (e.g. 
Dowland's lute works and songs from the early 17th century, and Louis Couperin ("the 
uncle”) about 50 years later for harpsichord). 


Finally, as mentioned in some of the other answers, in most previous periods of 
classical music, a performer was also expected to be able to improvise on the spot in 
most forms (often by having someone in the audience sing a few notes for a theme). 
This art lives on, but only in a small percentage of current day classical musicians 
(especially organists, and some pianists — for example look up some of Richard 
Graysons wonderful improvisations in concert on YouTube). 


é Alan Kay 
<< Still trying to learn how to think better - 5y 


Does a computer that has more data/information (huge amount of data) 
weigh more than the same type and model computer that does not have 
any data on it? 


There is an old joke — that | heard best told by Butler Lampson — about an aircraft 
loading engineer who was worried about “the extra weight of data” on punched cards. 
It was explained to him that a large amount of data actually weighed less because “the 
information is in the holes” ! 


€ Alan Kay 
‘Still trying to learn how to think better - Sy 


What majors are best prepared for jobs that don't exist yet? 


| like Vishnu’s answer. | had planned to sit this question out because | don’t think "job" 
per se is a big enough idea or goal (just as | don’t think “making money” per se is a 
good enough idea or goal for business). 


| think the trick is to home in on what you love while trying to notice how doing it well 
could also help provide food, clothing, and shelter. 


Pretty much everything | was interested in strongly enough to do deeper learning 
wound up helping me later in life — and the field | accidentally wound up in didn't 
exist when | was born, and has changed many times over the last 80 years. 


So: go for happiness for sure! 


€ Alan Kay 
1 Still trying to learn how to think better - Sy 


What's a uni-class? 


My version of “object-oriented” from 1966 on revolved around the active entities in a 
working system — | thought of these as being “full fledged computers” on a network 
and communicating via messages (this paralleled and was inspired in part by ARPA’s 
plans to do this with physical computers). With “virtual computers” | realized that you 
could do the neat thing of making everything — including the virtual networking and 
the messages, etc., and the stuff all this was made from — with “virtual computers” (all 
the way down). 


Other inspirations were Sketchpad, the first Simula, etc., and — a bit later — Lisp. 


The ideas that seemed solid in this scheme were the “virtual computer” and non- 
command messages idea (also implying encapsulation). 


There also needed to be a way to make instances of “ideas for virtual computers”. 
Sketchpad and Simula had similar ideas (that we'd call “classes” today). There's a top- 
down-ness in classes that is a bit of a barrier to “tinkering around” to “find” good 
systems organizations. 


The first Smalltalk (-72) — see the history — was inspired by several important ideas in 
Lisp. One of them was the relationship of a “function” and “apply” to what sending 
and receiving messages might be. Another was that some version of the four or so 
ways to think about practical closures produced an entity that could be like an object. 
There is also the relationship to the practicalities of evaluating a function to the 
storage structures needed. 


The basic idea here is that one doesn’t want to make a big psychological or action 
commitment about “type” or “class” until you can find out what it really is that you are 
trying to do. Both the language and the user interface should help with this. 


For example, in Squeak Etoys, every object could be thought of as a “uni-class” 
because it is a single thing that can be used like an object but can also be a factory for 
similar objects. The language Self used prototyping instead of classes, and this is 
similar. Henry Lieberman wrote a great paper about prototypes and delegation as a 
way to make similar things that could piggy back on others but also have their own 
characteristics. | wrote a few screeds at Parc calling for “exemplars” — something 
between a class and a prototype (or perhaps orthogonal). 


If you can have lots of classes — and modern languages can — it is possible to use 
some of the existing mechanisms to make a more bottom up experimentation 
environment that allows a lot of flexibility, is still using real objects — not data 
structures with attached procedures — and that — in a late-bound language like 
Smalltalk — can then be “elevated” ("beatified") into stronger abstractions as the 
systems design becomes more clear. 


One of the reasons this is really important is that design is *not* running to the library 
to see what's there. It is generally much better for the language and SDK to be good 
and flexible enough to allow “scratch programming” of a model of the system to be 
done easily and without pain. This will guide whether anything in the library will 
actually help matters. Trying to program the solution before making a model is really a 
bad practice. 


| view the “instance producing” mechanisms and conventions in OOP as still being a 
not yet well enough solved problem — so it's always good to try to get more people 
thinking about this. 


€ Alan Kay 
Still trying to learn how to think better - Updated 5y 


What does Alan Kay think of this remarkable initiative? 


| donated to this because | think some good might come from it. I'm not a fan of 
competitions — | prefer cooperations — so | think | would have at least cast this in 
terms of teams, which is also more realistic in what it takes to deal with most 
important issues and problems (Richard Eng reminds me in the comment below that 
this is cast in terms of teams — adding in the equivalent of what are sometimes called 
“friendlies” in soccer, or spring training games in baseball, will enrich the eventual 
competition). 


Also, one of the points of this more than 40 years old language was that it contains 
the metadefinitions to make better programming languages. It was quite 
disappointing to see that most people merely programming in it when it was released, 
as opposed to building more expressive systems in it. 


So | think I'd also try to nudge this in the direction of taking a few really important 
things that weren't practical for us to do at Parc, and to at least extend the existing 
Smalltalk to do these. For example, we really wanted “Uni-classes” in the 70s that 
allowed more bottom up programming — Smalltalk-72 actually had almost this — but 
the later Smalltalks were organized around classes. Both of these worlds — and the 
world of prototypes and delegation — can co-exist, and a good team effort could be 
to find good ways — including in the Ul — to do this. 


e In response to a few of the comments, I'm adding something more here 
about “prototypes” (which I've also discussed elsewhere). The basic desire 
and need is — after coming up with something useful — to be able to create 
more things like it. Almost as important is that the “something” gets 
“beatified” into a larger concept/abstraction that can help thinking. Note that 
a large part of the schemes for such things in OOP languages, especially 
dynamic languages, are to come up with something more expressive and 
useful than “types” (or, from another POV, to elevate the idea of “type” into 
something really worthwhile). 


¢ Classes are a pretty top-down way to do this, and I've never liked them 
because | think that design is aided by being able to make things before 
having to decide just exactly what they are. Prototypes (perhaps with 
delegation) are at another extreme (in some languages they just become the 
root of the links, in others they are partly copied and partly linked) — they've 


always felt “too wiggly” to me. One of the ideas we talked about at Parc was 
“Exemplars”, which were kind of “sample or ‘typical’ objects” that would have 
more expressive power than a class, and more stability than a prototype. For 
example, in Smalltalk “Class Document” is not very explanatory, but more like 
a framework. "Exemplar Document” would be instantiated to make a typical 
document, and could also serve as a framework. Note that a typical “part” in 
an exemplar would likely be different from a default or a “master version” (all 
these are probably needed). 


e I've not particularly liked any of the gestures at these problems over the 
years. | think there is much needed to do here, and a really good approach to 
what the abstractions should really look like, and how they get made, is yet 
to happen. 


There are many other things that could be done. For example, the Etoys framework in 
Squeak Smalltalk worked really well for late elementary school — one of its offspring 
was Scratch — but could vastly be improved (and Scratch even more so). This would 
be a good project to look at: what would be a great starting version and UI of a 
Smalltalk like language for beginners? 


€ Alan Kay 
‘1 Still trying to learn how to think better - 5y 


What is one moonshot project (as consequential as the ARPANET) should 
the US government undertake today? 


First, | think that “moonshot project” is not the metaphor you really want to use here: 

the actual “moonshot" was a stunt that set back real space travel by at least 25 years, 

and is now looking like more than 50. Also, the government — because of the poorly 
educated nature of the majority of Americans — is all too prone to “stunts” and other 
“gestures” which lack the substance and the legs to be worked out properly. 


By contrast, the ARPA-Parc research was done under the radar of public (and 
government for quite a while) scrutiny: for 12-14 years (1962 to 74-76) to yield the 
big workable inventions for universal personal computing and networking. This was a 
very good process and use of (not a lot of) public funds. 


A “grand challenge vision" today for the US that seems desperately paramount would 
be of a not too far future American citizenry who are above threshold with knowledge 
of their century and centuries past, and with the perspectives and skills to argue for 
illumination and progress rather than to try to win. 


As my friend and colleague Vi Hart once beautifully wrote: “We must ensure that 
human wisdom exceeds human power". 


The first stage with visions is to see if they are at all doable, and this one is. The 
second stage — a very difficult stage in design — is to identify the "“MacCready sweet 
spot”: the place that is above the qualitative threshold to be able to really turn the 
vision into goals and problems to be worked on, that is also the most reachable. 


There are many issues to be contemplated — and these are outside of your question. 
But consider that most Americans don't want to acquire qualitatively better education, 
and most teachers, especially in the elementary years cannot help with qualitatively 
better education. 


In analogy to the ARPA computing initiatives, the first — no one wants personal 
computing or pervasive networking — was dealt with by not trying to get any kind of 
agreement, but simply trying to come up with the inventions to do them, and then see 
if they could spread (the former was easier than having the ideas spread unscathed — 
| think more education about them was needed). 


For the second, I've written and spoken about what could be done in analogy to how 
the printing press changed much of Europe and then America by being able to take 
the semi-capture of great minds and thoughts through writing and then carpet bomb 
society via mass publishing (the very start of the industrial revolution and the current 
civilization was the inexpensive replication and distribution of ideas). This found 
enough auto-didacts and changeable teachers to make the difference through quite a 
few generations of general society. 


I think partly what is needed in analogy to deal with the current difficulties is the next 
logical stage of “the book" as it could be using the computer medium to carry it. This 
is a larger discussion that can be had here, but consider that “a book” today could 
teach reading itself, could help the reading of it, could be much much more than just 
words and pictures — or movies and sounds — but instead an entire learning 
environment. This does not solve enough motivation problems to work, but could 
make the difference in society-enforced schooling by augmenting the adults who are 
trying to help — “helping the helpers”. 


My vote for the greatest “invented curriculum” is Jerome Bruner’s "Man: A Course Of 
Study”, which was created in the 60s with lavish funding and great contributors. This 
was killed in just a few years by a combination of politicians and parents who used it 
to further their own political and ideological agendas. Today, it would be worse. A 
terrific book about this is by Peter Dow: “Schoolhouse Politics: Lessons From The 


” 


Sputnik Era”. @ Dealing with the backlash against “real education” is ultimately the 
most critical "MacCready Sweetspot” issue. 


In the end, most children try to learn what is in the environment around them, 
especially at home and school. To make a qualitative difference in education — and 
before both the societies of the world and the world itself collapses — the 
environment for children must be qualitatively better understood, and qualitatively 
better changed. 


€ Alan Kay 
‘1 Still trying to learn how to think better - Sy 


Why couldn't the management of Xerox see the goldmine that they had 
with inventions at Xerox PARC? Was it really about lacking money to 
capitalize? How about stock issuance? 


Both Giordano Beretta’s and Chuck Brantingham’s answers are good, so I'll just add a 
bit more context. 


The main sparkplug of Xerox was Joe Wilson {%, a very special guy who turned over 
the reins to his protege Peter McColough / in the late 60s, who in turn set up PARC 
as Xerox's long range research center. There was a lot of vision in Wilson, and a fair 
amount in McColough. He wanted — rightly — to get Xerox into computing, but they 
were not diligent about the company — Scientific Data Systems @ — they chose to 
buy with their burgeoning billions from the success of the Xerox copiers in the 60s. 
Now as Xerox Data Systems, they were eventually forced to close and write off a 
billion dollars ca 1974-75. Parc also almost was undone by this (a long story about 
how it survived). McColough was removed as CEO, and many problems for the 
company followed. This is worth checking out. 


The first time when enough had been done at Parc to really start thinking about 
commercializing the big inventions was ca 1976, and 


(a) Xerox was in trouble from various quarters, including new competitors at both the 
high end (IBM and Kodak) and the low cost end that they had never gotten into (the 
Japanese) 


(b) the state of the Moore's Law revolution allowed personal machines to be made 
and sold, but at prices which required them to be highly valued (or with low prices for 
machines that couldn't do enough in an office — e.g Apple started this year). 
However, several standalone workstation machines for office text editing using 
displays were starting to appear, and were being bought by companies: for example, 
the Vydec machine, which was about $15K. A large percentage of the cost of such 
machines was the memory, and this was the first technology to really be changed by 
Moore's Law. 


(c) Xerox really thought of itself as a copier company, not as an “information 
company” as McColough wanted, or a software company (which was the key to having 
future vision: SW + Moore's Law = a very different kind of future). 


At that time Xerox was a bit like England before the Magna Carta, with very powerful P 
& L divisions, which contended with each other. Parc, as a research center, was just an 
L center, and small to boot. One of the arguments was that a full-fledged general 
purpose personal computer would have pretty much the same components as a 
standalone display word processing machine, but could take on a much wider variety 
of tasks. The evidence was the Alto computer plus Bravo (now known as Microsoft 
Word), and numerous other apps that were used daily, the GUI, and a really interesting 
system, way ahead of its time: OfficeTalk — which was a prototype and model of an 
integrated office system in the manner of Smalltalk. 


The Alto had been purposely done to provide a personal computer in the early 70s 
that would be like a "Moore's Law” computer in the 1980s. This allowed the advanced 
SW — including the user interface frameworks — for individual users to be invented 
over some years of effort. This plan worked well. In the end Xerox could never wrap its 
head around Moore's Law and its implications. (Note that IBM really couldn't either — 
or they would never have farmed out the OS for the IBM PC in the early 80s — they 
never were able to seize this huge inflection point in computing.) 


Xerox did recognize the laser printer after it was made (worth following up how Gary 
Starkweather was “banished” to Parc for pushing the idea in Rochester — we loved 
him!). But again they could only deal with the larger aspects of it for corporate 
publishing. However, they made billions from it, enough to pay for Parc well more 
than 100 times over. 


The Japanese — e.g. Canon — pushed for an affordable desktop version, and this is 
what Apple eventually offered with the Mac. 


What were lacking — pretty much everywhere outside of Parc — were large visions of 
the future. For example, Steve liked the GUI but didn't understand the networking or 
how the programming was done. So the Ethernet eventually came out as an 
agreement between Xerox, DEC, and a newly formed company. Apple never 
understood the Internet despite being told over and over about it. 


As another example, one of many things that really needed to be done to 
commercialize the Parc suite of HW-SW was to make special purpose silicon CPUs to 
be able to do what the Alto could do with higher level language processing. This did 
not happen. Intel made an attempt with the 432 but tried to do too many features. 
The most impressive special purpose chip done in the early days of the 80s was the 3D 
graphics GPU done by Jim Clark — used for the Silicon Graphics machines — and first 
worked out at the ARPA project at Utah, Stanford, and then Xerox Parc (using the 
Meade-Conway design and fab process done there). 


Apple later had a chance to acquire Silicon Graphics — as the "7 years in the future 
Mac" — to get on that learning curve (another “Moore's Law” curve that they didn't 
understand — the workstations of today are the commodity personal computers of 
the future — but they passed up that chance). 


So ... “something” did happen in fits and starts and in different places. But it really 
wasn’t as well thought out as the integrations of hardware, software, user interfaces, 
and networking that was the hallmark of the ARPA-Parc research community. And we 
are paying for the willinilliness and “worse is better” craziness to this day. 


My conclusion from being a part of this and watching it happen over decades is that 
most people, most companies, etc., are very uncomfortable — and unequipped — for 
large and accurate visions, and this included both Apple and Microsoft. 


(a) They generally prefer to take their risks where they think they are low and where 
they think they know something, rather that to develop a background that will give 
them a stronger way to see into the exponentials that have been the hallmarks of our 
era. 


(b) They are also uncomfortable with “baseball” rather than “golf”. In baseball you are 
doing well if you hit .350, because hitting a round fast thing with a round fast thing is 
very difficult. The 65% of the time you don't get a hit is not “failure” in baseball but 
*overhead* for getting the .350. (Failure in baseball is failing in technique, such as not 
catching a flyball, you should do this successfully more than 98% of the time — the 
analogy here is that you should be successfully able to build any hardware or software 
system you design.) ARPA-Parc — via Licklider — was sure that given the vision and 
goals, that batting .350 or 400 would qualitatively change the world, and that the 
other 65% was “overhead” for qualitatively changing the world. This is what happened. 


This idea really works — it’s another exponential — but businesses and most people 
don't understand it at all, or well enough to have faith in it. Clay Christiansen’s book 
“The Innovator’s Dilemma” is not very good on a number of counts, but one of the 
worst aspects of it is that he excuses higher management for not doing well with new 
ideas — because they are new, because they could compete against the existing lines 
of business, etc. — but my view is “what else is higher management really being paid 
for if not for the knowledge and chops to deal with the future?” — and the context is 
that it is the 20th and 21st centuries that must be understood in order to have a 
chance at being part of and inventing the future. Higher management should not be 
excused for “flunking their century"! 


€ Alan Kay 
1 Still trying to learn how to think better - Sy 


What are some great stories from Xerox PARC? 


There are quite a few online, and this would be appropriate because the ARPA-Parc 
research community put quite a bit of effort into inventing personal computing and 
pervasive networking so that people by themselves and with their friends and 
colleagues could find out important and interesting things online. 


So: | first typed "Alan Kay History” into Google, and on the first page found “The Early 
History of Smalltalk” on Bret Victor's site — | wrote that in 1993, and it has quite a few 
good stories about Parc in it. 


| was surprised that “The Power of the Context” (2004) didn't come up on the first 
page, but this can also be found via Google (try “alan kay the power of the context”). 
This one is especially useful because the references are pretty extensive, and many of 
them have online links. 


One of the references there is Mitchell Waldrop’s “The Dream Machine”, which is by 
far the best history of the larger streams that created ARPA and then Parc. Lots of very 
good stories here, most of them at least “more or less true”. Patrick Collison tells me 
that Stripe Press is going to reprint this very worthy book. 


Two other references that are vital — and have a few good stories in them — are the 
histories by Butler Lampson (Parc’s “Oppenheimer” and more), and Chuck Thacker 
(Parc’s hardware genius and all around genius). These are both available online. 


Then | tried “xerox parc history" with Google. This was more random — e.g. ignore 
Malcolm Gladwell's “imaginings” in the New Yorker — but | did find on the first page 
“Xerox's Legendary Palo Alto Research Center — IEEE Spectrum". This is a terrific piece, 
mostly by Tekla Perry, that takes a sharp look at the interplay of personalities etc. 


there. 


There's quite a bit more, but this is the way | try to answer questions using the 
technology that is there. It's surprising that | can't explain what I'm looking for to even 
asemi-Al yet, but just a little digging will find pretty good primary material. 


€ Alan Kay 
<7 Still trying to learn how to think better - 5y 


Alan Kay, could you give an example of when software development has 
driven hardware development? 


One way to think about how software helps computing is that it introduces “late- 
binding" into machinery. The very idea of a “changeable stored program” allows early- 
bound mechanisms to take on different purposes. I’m a very big fan of the Jacquard 
loom — not just for its ability to be driven by different programs — but for the 
wonderful tolerant engineering that Jacquard devised to allow it to work so well so 
early in the industrial revolution (ca 1800). (A lot can be learned about “how to think 
like a great engineer” by studying this design.) 


The idea of “registers” to hold parameters is another late-bound idea (compare to the 
many dials that had to be set by hand in e.g. ENIAC (which also had a few registers)). 


Putting a Jacquard like program in memory is a very late-bound idea — | like Turing’s 
formulations as an example, and then the Goldstine-von Neumann adaptation of this 
idea for getting modern computing going. A big problem was how to make enough 
volatile memory to do anything. Some of the first attempts were acoustic delay lines 
(using mercury), Williams tubes (using persistence of charge on CRTs), rotating drums 
(later), and finally the invention of core memory by Forrester for Whirlwind in the early 
50s. 


A great idea that has been used off and on — and should be available to 
programmers on more CPUs — is microcode, invented by Maurice Wilkes in the 50s to 
allow late-binding of how basic instructions and other low level features could be 
made with the hardware. 


Early machines did not have index registers — they are another example of late- 
binding something. They did not have stacks to hold return addresses or previous 
contexts. They did not protect memory from random or malicious errors. Etc. They had 
essentially no features for the advent of higher level languages such as Algol in the 
late 50s and early 60s. 


A machine architecture ahead of its time — and still today — was the Burroughs 
B5000 in 1961-2 by main designer Bob Barton — which was entirely driven by 
software needs. These included virtual memory and exchange of segments with a 
swapping drum, fine grain “capabilty” protection of everything, multiprocesses and 
automatic process switching — thus multiple CPUs — a comprehensive stack structure 
that could hold procedure activation records as well as return addresses, and also 
intermediate results of expression computation. Compact “byte-codes” for instructions 
fetched 4 at a time. Etc. 


Besides all the automatic processes implied by the above paragraph, let me mention a 
few that might not be so obvious. A simple one is that the protected (by a tag bit) 
“descriptor” for any kind of object contained useful information beyond just the 
address: if an “array” it would have the length of the array, and this would be 
automatically checked in parallel against the index to be used so it was impossible to 
store outside of an object. Another one is that a procedure descriptor could be stored 
where an array descriptor was expected, and the right thing would happen for both 
fetches and stores (i.e. the B5000 could simulate “objects”, “data”, etc.). 


The “byte-coded virtual machines” for higher level languages that have been used for 
many years are emulations of some of the features of the B5000, but on the B5000 
they ran at top “machine speed”. At Xerox Parc, we — Chuck Thacker was the main 
designer — made these machines for our languages by making microcoded 
computers where the microcode ran many times faster than main memory could cycle 
(this allows pure hardware speeds for language emulation to be obtained on many 
occasions). 


Another example can be drawn from the Parc architectures: multiple program 
counters with zero-overhead task switching. This was first done on the TX-2 of Wes 
Clark at Lincoln Labs, famous among other things for being the computer that 
Sketchpad was done on. The Parc Alto had 16 program counters, each pointing into 
the microcode. Attached to each program counter was “lookaside” logic tied to 
various conditions in the hardware represented by a bit, for example: a disk sector just 
passed, the CRT retrace pulse just happened, a two millisecond tick just happened, the 
ethernet card had something of interest, etc. A simple scanner looked to see which 
was the highest priority program counter and selected it for the next microinstruction. 
Thus up to 16 microcode tasks could be seamlessly interleaved using a (carefully) 
shared register file. This allowed most normal hardware to be eliminated — the 
normal controllers for display, disk, keyboards (including music keyboards), pointing 
devices (including the mouse), other I/O, etc. were all emulated. So was “bitbit”, the 
screen painting primitive. So was the Smalltalk virtual machine (and other languages’ 
virtual machines), etc. 


(This is how we were able to make an affordable buildable personal supercomputer in 
1973 — eventually almost 2000 of these were built in the 70s. And this is why the 
large accomplishments in software that happened at Parc were possible: the software 
needs drove the hardware designs very very deeply.) 


Much later, the software needs for graphics rendering eventually — a terribly long 
time except in the Silicon Graphics machines (done by an ARPA guy and pioneered at 
Parc) — resulted in various GPUs. 


A reversal of the progress made — but still to your question — is when a low level 
language — C — was taken as “popular” and hardware started to be made to cater to 
this. This started to hurt development of really high level languages, and continues to 
this day. (I should note here that having a universal intermediary language as a 
universal target for higher level semantics is a very good idea — and this represents 
something that could be assisted by hardware — the problem is that C is not very 
good at representing what needs to be modeled for advanced computing ...) 


Just to pick one of many crazy things in today’s CPUs: it's worth looking at just how 
slow is a full process switch in most commercial CPUs these days. Combine this with a 
very poor way to think about MMUs (things are different than they were in the 60s!) 
and it's hard to make hundreds of thousands to millions of protected processes. There 
are enough gates to do a very different architecture that would be much more safely 
parallel, etc. This is quite missing what is actually needed for various scales of 
computing. 


Interesting hybrid architectures — e.g. Intel has some — are CPUs that combine a "C 
based" processor — like an ARM chip — with FPGA programming. Much can be 
accomplished here if the memory available has high enough bandwidth and capacity. 
Intel tends to do this poorly. 


Another miss was to take a good idea — the “Harvard” two cache architecture — and 
make it too small to do a good emulator. How to do this well is something that 
readers should try to work out for themselves. 


Another thing worth pondering is how to get enough memory bandwidth to multicore 
architectures to make them really worthwhile. 


| think it's fair to criticize the current lashups of hardware available today as being kind 
of randomly helping or hurting software. From the perspective of many decades, it 
seems more like the hardware vendors are mostly coming up with designs that make 
their own processes easier rather than being interested in software development. 


€ Alan Kay 
1 Still trying to learn how to think better - Sy 


What did Steve Jobs mean when he said that Apple is fundamentally a 
software company? Was he right? 

My slogan in the Xerox Parc years was: “Hardware is just software that is crystallized 
early”. In other words, the proper business of computerists is dealing with process and 
processes. Large ideas will lead to software architectures, Uls, languages, and 
development systems. At some point the necessary but least convenient parts of a 
system — the hardware — will become clear. 


This was a reflection on the ARPA computer research community | grew up in (and its 
predecessors in the SAGE project that started with Whirlwind). This community was 
constantly coming up with new process ideas, simulating them on existing hardware, 
and then designing and building new hardware that better fit what they wanted the 
processes to do. 


One way to think about this is that the job of hardware is to optimize software 
(including getting it running at all). Unfortunately this was never made clear enough to 
Intel and Motorola — despite a lot of efforts on our part in the 70s. So we have the 
unfortunate backwards dance of HW vendors producing CPUs etc and getting the 
software people to make it look good. 


This was one of several reasons that we decided at Parc to make all of our own SW 
and HW — and did. This allowed us to simulate the software and its environments 
beforehand, and then to design and make machines that would be efficient in 
supporting the ideas. 


The CAD -> SIM -> FAB cycles are in general used in most serious engineering fields, 
but serious engineering in computing is still lagging (e.g. note how long it took for PC 
vendors to finally include a GPU, etc). 


Steve didn’t understand this very well in the early days of Apple, but could see that the 
Parc GUI was a SW idea that forced new HW (the Lisa and Mac). As he said in a later 
interview, he missed that the way we programmed also required new HW (we did, 
Apple didn't), and so did the way we networked things together (ditto). 


I think his stint at NeXT got him to understand SW a little more, and the Pixar folks 
really understood it. So he had the idea more strongly when he came back to Apple 
(hence his use of another slogan of mine that he liked: “People who are really serious 


about software should make their own hardware.) 


€ Alan Kay 
7 Have designed a few programming languages : 5y 


How much percentage of code is shared between software projects, e.g., 
between a video game and a word processor (in terms of STL, OS, etc)? Is 
this percentage comparable to the percentage of genetic similarity between 
mammals? 

Originally Answered: How much percentage of code is shared between software projects, e.g., 
between a video game and a word processor (in terms of STL, OS, etc)? Is this comparable to 
genetic similarity between mammals? 

One answer to this question can be gleaned by asking yourself how many different 
text editors you have to use in a week (for example, right now I'm using the Quora 
editor, | use one for email, another for when | have to chat, and others for writing 
papers, commenting in blogs (Quora has a different UI for their comment editor ...), 
etc.). 


They do share some properties, for example, they all use the “modeless” text editing 
that was invented at Xerox Parc. But e.g. the Quora, and many other editors found in 
the browser, is not modeless at the next level (right now I'm in an editing mode and | 
will have to hit the “Submit” button to even see how what I'm typing right now is 
going to look — this was done away with at Parc (i.e. WYSIWYG), but separate 
presentation modes have crept back). What can be embedded or not is different, etc. | 
often have to attach “really rich text” documents to emails because the email editors 
are weak. 


And so forth. 


Now we have to look at the systems aspects and about the underpinnings. If the 
editor were a “real object” that was viewed by a “real view” the same text/editor could 
be used everywhere (this was done in Smalltalk at Xerox Parc in the 1970s). 


If you don't have real-objects/real-views, and don't have late-binding, etc., — i.e. if it is 
difficult to write and use real systems in one’s programming environment — it can be 
quite a chore to simply embed standard components and be able to use them in an 
integrated fashion. 


It's puzzling and annoying that this is still the case in 2018! 


€ Alan Kay 
Still trying to learn how to think better - Updated Sy 


Did astronauts on Apollo have the means to kill themselves in case their 
mission failed? 


| was in the Air Force in the early ‘60s, and part of the flight training then — and likely 
still today — was to experience various effects of high altitude in an “altitude 
chamber". The one we had at James Connelly AFB in Waco Texas would hold about 25 
people. One of the drills was to buddy up and have one of the pair take off their mask 
and copy some writing from a book. This was done at 18,000 feet, 25,000, 35,000, and 
45,000 feet. 


The experiences were the same at each altitude. One remembered writing, and the 
next thing one remembered was the buddy holding the mask over your face. The 
writing would be OK for a bit and then would turn into a scrawl. There was no memory 
of the change. 


| don’t remember the intermediate times but 18,000 feet took about 2 minutes, and 
45,000 feet was about 20 seconds. 


The reason there was no sensation is that our bodies can't sense low oxygen (some 
animals can) but we sense only high carbon dioxide (which makes us gasp). But under 
high altitude conditions there is no gasping because the partial pressure of carbon 
dioxide allows it to continue to exchange out of our lungs. Meanwhile there is not 
enough oxygen entering. 


The purpose of this was to scare the shit out of us so we would constantly check to 
see if oxygen was flowing while we were flying. (It worked!) 


The other memorable experience with the altitude chamber was explosive 
decompression. This was done four people at a time in the airlock to the larger 
chamber. They'd take the large chamber up to 45,000 or higher and then suddenly 
vent the airlock to it. One of the warnings was to have your hands in front of your face 
to catch your mask as the air in your lungs would blow it off (despite the straps). 
Another was to have a bowel movement before doing this! One of the jokes was that 
you could tell the highest rank that had been in the airlock by how bad the smell was. 


| wonder now whether they still do this in quite the same way as in 1962. Both of the 
exercises are moderately dangerous: they let us pass out four times from lack of 
oxygen, and the explosive decompression could badly damage one’s inner ears, etc. 
Another exercise that they might not do these days is the ejection seat simulator, 
which is quite a whack (some ejection seats could pull enough Gs to compress one’s 


spine, etc.). 


As to the question, regardless of how much astronauts might strive to fix the situation, 
they would have all gone through these experiences in the ‘60s and would know that 
just lowering the partial pressure of the air they were breathing would accomplish a 
completely sensation-free send-off. 


é Alan Kay 
<7 I'm the "Alan Kay" in question (try Google for the usual misinformation) - Updated 5y 


What's the best book about Charles Babbage in terms of technical detail? 
| would also vote for the Swade book. 


In a computer design course | taught many decades ago, | asked the class to design 
the computer that Captain Nemo would have had on the Nautilus (ca 1865). 


A few students tried the Babbage route with mechanics. Most went for electrical 
schemes involving relays (Morse’s telegraph was ca 1845). As far as | can recall, no one 
thought about pneumatics and “fluid logic”. This technology also started developing 
around the 1840s e.g, with the “Barker lever” force amplifier for large pipe organs. 


It's worth pondering what each of these technologies brings to the table, and also 
their difficulties. 


I'll just point out here that a big problem with most mechanical schemes is to do what 
is called “restoring logic” so you don't have to have the signal also carry the 
propagating power needed. | once got a chance to actually crank the London Science 
Museum Difference Engine, and it just barely works, it is at the limit of what you can 
drive mechanically. 


Danny Hillis told me that after he and Brian Silverman did the TinkerToy computer to 
play TicTacToe — which had a hard time working — they realized that they should 
have tried to find some way to do mechanical restoring logic — it was just too hard to 
drive signals (in old computer logic terms, the “fan out” was much too shallow). 


By contrast, the Jacquard looms do not have to do a lot of work. The sensing and warp 
lifting is done by pressure against restoring springs (of the “needles") and lifting 
weights (on the warp threads). 


The pressure needed to apply the “cylinder” (a rectangular solid with holes in each of 
the faces) with a card on top against the pins that are connected to the springs is 
minimal. The main work is to move the “cylinder” back while advancing a card. This 
can easily be done by a single person or by an engine. 


If you look closely at how a Jacquard loom works*, you will be struck by how few 
complications he wound up with (one of many wonderful principles was to use flexible 
yarn to route pull-up forces rather than to try to use more normal mechanical means, 
e.g. the "trackers, stickers, squares, roller-boards, etc.” of the pipe organs of the day). 
The mechanisms were almost flimsy — not really, perhaps: “no more than absolutely 
needed” — but were also very easy to replace and fix. One of the main principles 
everywhere is “large tolerances” with “registrations when needed” — this works 
extremely well! 


If we compare the two designs, the Jacquard loom is “light, airy, easy, graceful, 
flexible, tolerant, beautiful, akido, fixable, etc.” — and works! — and the Babbage 
mechanisms of the Difference Engine were quite the opposite in every regard. (A 
wonderful charette for engineering design students!) 


Babbage correctly saw that the Jacquard punched cards were a great idea, but he 
wanted to apply these to the mechanics of the Difference Engine. Almost certainly, 
what he should have done is to start with the Jacquard idea and think the whole rest 
of his engine through again from scratch (there are some interesting implications in 
the “logic” that Jacquard invented to make simple decisions, and of his highly tolerant 
way to do the mechanics ... ). 


Ada's famous “Notes” on her translation of a paper about the Analytical Engine were 
written in 1843, right at the time when there were more technologies that could have 
been considered ... 


Relays give you a way to do restoring logic when you need it, but if you take a look at 
Aiken's Mark | at Harvard, or Stibitz's complex number calculator, also done with 
relays, it is not a good route to making a computer. 


Fluidics on the other hand can be done by etching glass plates. The gates can be 
small, it is easy to do restoring logic, it's faster than relays, etc. Easy to do a display, 
etc. l.e. Wow! 


* | realized belatedly that | should give as reference the excellent book from which | 
learned the details of the Jacquard loom: (1888) The Jacquard Machine Analyzed and 
Explained, by E. A. Posselt. Thanks to the wonderful Brewster Kahle and his 
Archive.org, this book can be found and downloaded via Google (it was a lot harder 


for me to run down this book years ago ...). 


é Alan Kay 

“= Still trying to learn how to think better - 5y 

What were the best lessons learned at PARC’s Learning Research Group? 
Too big a question for Quora-sized answers (| would have to explain what we did and 


how we did it, and the last time | did that it turned into a 50 page history for the ACM 
4). 


But one thing | did learn that doesn't require the historical knowledge to understand 
is that the way to “let the completers complete” and to keep working on new ideas is 
to form new groups as needed. 


Most ideas are mediocre down to bad, but the few that are better than that are still 
easier to have than to do serious software engineering to make the previous ideas 
real. Many ideas arise from seeing the old ones more clearly and this often happens 
during development. 


| made the mistake at Parc of trying to get my research group to switch to new ideas 
too often and didn't deeply understand enough that as “completers” they really 
wanted to complete (and it is very good that they did over and over because it was the 
goodness of the actually working stuff that made the most impact on the rest of the 
world). 


Even so, this stellar research group did 5 different Smalltalk systems — all working — 
in a little less than 10 years, one following the next every two years. 


The way to solve the problem of ideas in the theater is to just do a new play with a 
new company, and you wind up with several plays that are worth pondering that can 
cause further ideas to flow. Later | realized that this path could have been taken in our 


group. 


Alan Kay 
| am the Alan Kay in question. - 5y 


Can someone link me to the original white paper discussing what was 
eventually coined OOP? | believe it was by Alan Kay, but I'm not able to find 
it simply in a Google search. I'm finding it referenced, but not linked to, and 
| don't recall the name. 


Depends on “which OOP” you mean. The kind of language that C++ is, was first 
created in Simula |, in the mid-60s, and Nygaard and Dahl wrote a paper about it in 
the CACM in 1966 &. They reference an earlier document they wrote in Norway in 
1965. 


The idea of associating procedures and data together goes further back, and has 
appeared independently several times. You can read about some of these schemes in 
my “Early History Of Smalltalk 2". 


| saw some of these when | started programming in the Air Force ca 1961, but | was 
particularly struck by Ivan Sutherland's Sketchpad system (his thesis was MIT, 1963, | 
read it in Nov 1966). It was like magic. A week later | read the Simula documentation, 
including the paper above, and realized that Simula had a similar structure to 
Sketchpad, wasn't as nice, but did have the open-ended feature that you could write 
programs — even kludges — to make things happen. 


What | called “OOP” came from the realization that (a) a “real object” should be 
semantically equivalent to a whole computer, (b) that this would be perfectly universal, 
(c) that if messaging were used — as was contemplated in the ARPAnet — enormous 
scaling was possible, (d) that everything should be an object (for many reasons), and 
so on. Besides many analogies to networks of computers, and processes within time- 
sharing systems, my background in biology got me thinking about -real scaling- (far 
beyond the means of computing at that time), and my background in math got me 
thinking about “algebras” of communication. 


This wound up with an image of something rather different from Simula, and created 
quite a few years of work on both designing HW-SW to run something this universal, 
fast, and small enough. 


é Alan Kay 
- ~~ Worked at Apple (company) - Updated 4y 
How would Apple have benefited if Steve Jobs were technical? 


This is a very complicated question to answer in a fair and comprehensive way. Let me 
apologize in advance for its length. 


Some context: Steve and | were friendly from after the Parc visit until he died. | think 
that he can't be too highly praised for many of his attributes, especially courage, 
determination, communicative abilities, and will. 


First, we have to ask a question about “technical”. Technical as in being able to fix a 


computer, technical as in Woz, or technical as in Chuck Thacker? Or as in an expert in 
computer products, or as ina skilled programmer today, or as in Dan Ingalls or Butler 
Lampson? 


Here's Steve in an interview in 1995, years after the visit to Xerox Parc and then the 
Lisa and Mac, talking about the visit to Parc @: 


(This URL seems to work in both the US and the UK, the segment I’m referring to 
starts around 6:35. Perhaps someone can fix this link if it doesn’t.) 


We can see that what appealed to him was what he could see — basically the bit-map 
screen and the Smalltalk GUI — and the other really important systems aspects — 
such as networking and how and why we did it, and advanced programming and how 
we did it — made no impression. 


(Note also that he catches himself praising Parc too highly and starts downgrading his 
enthusiasm. To get an idea of what he saw, take a look at this video tribute | did for 
Ted Nelson a few years ago ({% using the genre of Smalltalk that Steve saw in 1979: a 
rescued Smalltalk from 1978 that we brought back to life ca 2014 — the main 
difference here and what he saw is that more memory today allowed more bit-map 
pictures to be used — but he did see the painting system and that exact elf, that 
graphic system, the multiple desktops, etc..) 


Years later at NeXT, he was working with people “more technical” than most of the 
people at Apple in the late 70s and early 80s — people who knew a lot more about 
both hardware and software, about operating systems and objects, etc. The result was 
amore technically sound product, many of whose fruits were brought back to Apple 
when Steve returned. 


So Steve wasn't “technical”. 


However, a person even more important to personal computing than Steve was Bob 
Taylor — one of the four key ARPA funders, and the person who set up Xerox Parc. 
Bob was also not technical. He was a psychologist who had the wonderful disease of 
wanting interactive computing for everyone, the same disease that Steve later 
contracted. Bob was also charismatic and a person who made things happen. 


The biggest difference between them was that Bob put a lot more care into deciding 
what technical people to trust with the problem finding and the ways to make 
progress. This was partly because of Bob's vantage point of being a researcher at 
NASA, and then funding people like Doug Engelbart and Ivan Sutherland at ARPA. 


The result of Steve being Steve was that some not good technical decisions were 
made by people he trusted — and some by him — mixed with some better technical 
decisions — and then he was able to get many of the conglomerations successfully to 
market (the latter should never be discounted — it is really difficult, and Steve can't be 
too highly praised for getting the stuff into manufacturing and the marketplace). 


But in the end being able to replicate an idea is one thing, getting the worthwhile 
ideas to manifest in the first place is quite another — both require super high quality 
people — and rather different people. 


Another complication that needs to be mentioned for context is that the start of the 
70s and the start of the 80s were two completely different times and situations. No 
one else was working on powerful personal computers in 1970, but in 1980, there 
were already quite a few 8-bit micro hobby computers — including the Apple Il — 
quite enough to get IBM interested enough to enter the market in a big way in 1982. 


We knew in the late 60s pretty much how much computing power was going to be 
needed to pull off a “real personal computer” with a generally useful screen and GUI, 
and that it would need at least a 16-bit processor, etc. Moore's Law calculations said 
“by the end of the 70s or into the 80s". The Parc Alto was made to “compute like a 
decent personal computer of the mid-80s” but starting in the early 70s. This gave us a 


head-start on the much harder design and software system building problems 
required for a real personal computer. Most people don't realize that the Alto and 
Smalltalk had been running for more than 6 years when Steve finally saw them for the 
first time. Nor do they realize that there were thousands of Altos by then, connected 
by Ethernet, with file servers, page a second laser printers, a Parc “Internet”, etc. Steve 
was shown the whole suite of things needed for real personal computing. 


One of the key factors of the 80s to the present day is the combination of competition 
and a generally unsophisticated market. This means “early to market" is a big deal 
(because something really good down the line won't be valued enough to displace the 
poorer early movers). This makes it really difficult to work on “important good things 
that take time”. The lure of “sell now” tends to kill off “what is actually needed”. This 
happened in most areas of hardware and software, and created poor de facto 
standards — and “redefined normal downwards” — to make it difficult to both do 
good things, or even for most people to have the perspective to criticize them. 


Marketing people would generally prefer a product not to have a learning curve, so 
they would much rather cater to what people think they want rather that what they 
*need* (what could actually not just really help them, but help to qualitatively change 
the ways and power of their thinking). This leads to “consumer marketing and 
products” and something like a “pop-culture”. 


Sic transit gloria mundi. 


€ -\ Alan Kay 

1 Still trying to learn how to think better - 5y 

What did Alan Kay and Steve Jobs talk about at the 2007 iPhone keynote? 
Steve and | first met during his visit to Xerox Parc in 1979 at the urging of some of his 
computer people to see what we'd been doing for the previous 8 years. You can get 
an idea of this by taking a look at a revived Smalltalk system from the year before 
(1978) that several of us brought back to life a few years ago after its 40 year slumber 
ona rescued disk pack that Xerox had thrown away. 


| used this to make a tribute for a Ted Nelson celebration %. The main difference 
today is “more pictures” (we didn’t have enough memory back then to use more than 
a few). We showed Steve many of the demos that are on the Ted Nelson tribute video, 
plus a few others not on the video. 


This subsequently formed the basis of the Lisa and the Mac a few years later (the Mac 
was done in 1984, 11 years after the more powerful and capable Parc Alto, which 
started working in 1973). 


Later, | was chief scientist of Atari for a few years (81-84), and Steve and | would 
periodically have lunch. The last year of Atari was a collapse and | eventually accepted 
Steve's invitation to come to Apple. 


In 1984, Time or Newsweek asked me my opinion of the Macintosh, and | said “The 
Mac is the first personal computer good enough to be criticized”. Internally at Apple, 
my first memo had the title “Have | got a deal for you: a Honda with a one-quart gas 
tank!”. Steve did not like this memo, but what could he do given the history, and that 
it was quite true? 


Steve and | remained friends (| was the go-between that brought him together with 
the people who were to become Pixar). 


I think he invited me to the 2007 iPhone unveiling partly because it was kind of a tiny 
“Dynabook” — and he had always wanted to do one — and partly because he was 
going to use a quote of mine @ that he had always taken to heart “People who are 
really serious about software should make their own hardware”. 


The photo of us chatting: 


was taken right after the event. He brought the iPhone to me, put it in my hands, and 
asked: “Alan, is this good enough to be criticized?”. My reply was to make a shape 
with my hands the size of an iPad: “Steve, make it this size and you'll rule the world”. 


When the iPhone had been revealed a few minutes earlier | realized that they must 
already have done an iPad/Dynabook-like machine (easier) and that the “iPhone first” 


must have been a marketing/timing decision. 


€ Alan Kay 
7 Have designed a few programming languages : 5y 


If a programming language is also written in a programming language, how 
was the first programming language written? 
Originally Answered: If the programming language is also written in programming language, how 


was the first programming language written? 


Another incomplete offering ... 


It's worth looking at “deep bootstrapping” because it helps gain experience with 
“meta leverage thoughts”. For example, what if there is only one computer with only 
machine code? Here you have to use the bit-patterns that correspond to machine- 
code instructions. What if you are both lazy and ambitious at the same time? l.e. you 
don't want to muck around in the bits for any longer than you have to, and you also 
want to make a high level programming language as quickly as possible? 


One way to think about this is that you want to make a few really powerful “bricks” 
without a lot of hand-coding that will then allow bootstrapping something that will 
bootstrap your desired language (and maybe you'll want one more bootstrapping 
phase in there). 


1. One of the many fun and profound things about Lisp is that it only has a few 
primitive “functions” — car, cdr, cons, atom, equal, lambda, read, etc. — and 
most of these are extremely simple and small in any typical machine code. 
One route would be to code these by hand and get a minimal but universal 
Lisp going to then use as the basis for the next levels of HLLs. 


Nm 


. Another approach is to make a very simple byte code interpreter — with byte 
code implementations on the lines of the Lisp functions — and then use a 
technique like that of Meta I! &% (Val Shorre in 1964) to install a simple but 
powerful metacompiler written in itself that can bring up further versions of 
itself and of other HLLs (complete examples are given in the very neat paper 
from 54 years ago). 


Ww 


. BCPL (Martin Richards) was a transportable systems programming language 
used in the 60s (it was the ancestor of "B" and “C" and “Bliss” etc). It was 
written in itself in terms of byte-codes, and as above, one wrote the byte- 
code machine by hand, brought up BCPL, which then had the tools to both 
improve itself (write a compiler for BCPL to the new machine) and to 
implement other languages and systems. 


4. The first Smalltalk (-72) was written as a one-page interpreter similar to Lisp 
except the equivalent of “eval/apply” was sending a message to an object as 
a pointer to the message stream (kind of a more general FEXPR, and the 
object could use Meta Il-like techniques to parse the message on the fly. 
Getting this one pager going jumped all the way to a VHLL with dynamically 
extensible syntax. 


From the 60s on we have had more than one computer at our disposal, so we can 
think of ways to bootstrap with more tools at our disposal. People write cross 
compilers, but | advocate that to be the last thing done. You are much better off to 
bootstrap various interpreters in the early part of the game. 


An interesting wrinkle is to write an interpreter of the target machine on our 
development machine. This makes a lot of sense, because then much can be done in 
the arms of our development tools before having to drop bits on the target. 


| had the fun of doing this ca 1965 when | was programming at the National Center 
for Atmospheric Research. The head systems programmers didn’t trust Control Data 
so they were determined to do their own systems software for the not quite released 
6600 supercomputer, and they asked me to write a simulator for the 6600 — on our 
existing CDC 3600 — so it could be debugged. The 6600 was quite radical compared 
to most HW architectures, and it was quite fun to make something that would emulate 
the machine speedily enough (this was written in 3600 assembly code to get 
maximum speed). 


This turned out to be prescient because soon after this there was a crisis 
announcement of “no code from CDC" for the 6600 supercomputers. Those who were 
going to get these machines — Livermore, National Center for Atmospheric Research, 
Sandia, Los Alamos, etc. — pooled what they had done outside of CDC to see what 
they could come up with. We all wound up moving to Chippewa Falls Wisconsin 
(Seymour Cray’s home town, and where he had his lab) for a good part of a year to 
piece together a software suite that would allow the machine to be used. 


The 6600 simulator | had written was pressed into service again — | rewrote it in 6600 
assembly code (using the NCAR assembler that had been debugged on my 3600 
emulator) in order to make a debugger for the 6600 (Cray had neglected to put 
interrupts on the machine and it was very hard to control). This simulator used the 
machine itself as much as possible to retain speed (which was intertwined with how 
the main memory cycled — the 6600 could run 10 times the speed of main memory). 


Bottom line here is that emulating the target HW seems like a lot of work when 
bootstrapping, but it allows much better development and debugging, so what is 
eventually bootstrapped is much more of a self-sufficient system. 


Similarly, it is always useful to make a simulation of the language even when it is to be 
compiled. This helped over and over in the experience with Smalltalk and its 
bootstrapping adventures (e.g. how Squeak was made and deployed (7). 


Alan Kay 
Have designed a few programming languages - 5y 


What is this "live objects" in Smalltalk? I've gotten used to that “edit- 
compile-test-debug" cycle, and want to understand the philosophy behind 
Smalltalk (Pharo). 


John Thorpe’s answer is a good start. Now extend the idea to the entire system and 
think of what we used to call “objects” (now we have to call them “real objects” since 
“objects” were co-opted in the early 80s) as software computers semantically 
equivalent to computers on a network like the physical computers on the Internet 
(these ideas were invented in the same time frame by the same research community). 


In both cases we have active systems that not only provide services to users, but which 
also have to provide services to themselves (maintenance, improvements, etc.) while 
continuing to run. 


In the case of Smalltalk, the idea was that some of the active objects would be the 
interactive development "kit", some of the active objects would be code, some would 
be translators to the VM, some would help debug, and some would deal with errors in 
non-destructive ways. 


So most Smalltalks have been up and running for many years providing services, 
doing development, etc and have never been down, have never gone through a 
typical edit-compile-load-debug cycle, etc. The equivalent of the latter is done while 
the system is running, and the changes take about 1/4 second to take effect. 


This is extreme late-binding. In Squeak Smalltalk for example it is theoretically 
possible to make a change in the deep system kernel and replace the old one while 
the system is still running (I don’t know of anyone actually doing this). 


There were many contributors to the design and integrity of this scheme, but the main 
credit should go to Dan Ingalls (I just “did the math’). 


For completeness, | should remind that we are talking about a more than 40 year old 
system, and quite a bit more along these lines could be done today. Here is a revived 
Smalltalk from 1978 @ that | used in a tribute to Ted Nelson. 


€: Alan Kay 
<< Still trying to learn how to think better - Updated 1y 


Why did Alan Kay once say GNU/Linux was a budget of bad ideas? What is 
generally wrong with GNU/Linux, and Unix-like OSes? 


| should learn to keep my big mouth shut (not the first time I've had this thought!). 


The reason is that it is easy and OK to praise something in a sentence, but | think that 
a complaint should be backed up with careful discourse. Part of the problem here is 
that presentations that are “oral” are now being recorded and preserved. | would not 
have bothered to complain in print about operating systems because | really wouldn't 
like to expend the time to explain all the things that need to be explained to back up 
such a complaint. 


And Quora is not a right-sized forum either. So let me try some meta observations, 
and we can see if the readers can turn these into balanced observations. 


This section added Feb 11th 2022 


To be balanced, | should have included some of the "bad ideas” we put into Smalltalk 
in the 70s: for example, using "Lisp-pointing” to relate everything: we needed late- 
binding, but not the dependencies! Etc. 


| didn't say much about scaling difficulties, including that even good ideas usually 
have half-lives that are much shorter than ours. (So the Lisp hacks really worked well 
for us in the small and earlies. Etc. 


One of the good ideas in the original Unix was to minimize the privileged kernel to the 
most important allocation and protection of time, space, and communications, and to 
have everything else be done in standard Unix processes. Another very good idea was 
Doug Mcllroy’s insight of “pipes programming” to allow modules to intercommunicate 
via messages and interpreters in each process. Skipping forward to my favorite version 


tou 


of Unix, we have Gerry Popek’s “Locus” architecture in the 1980s, which could migrate 


processes over networks, balance loads, and have them run seamlessly on different 
hardware types. 


Linus’ original kernel for GNU had much the same simplicity. 


The meta views can start with thinking about the Internet as we have it today, and 
what kind of machinery is needed to allow it to be used as a computing resource. 
There is TCP/IP, which disappears the lower level machine idiosyncrasies. An idea 
contemporaneous with Locus was Berkeley Sockets that smoothed some more things 
out but missed the bigger ideas in Locus. The much bigger and better idea about how 
to go about things — from Dave Reed's 1978 MIT PhD thesis — got missed. 


Quite a few things in computing get done for immediate reasons and conditions and 
then neither scale nor are redone. (The “willy nilly” house metaphor used in one of the 
other comments is apt.) It's worth pondering just how much early binding (trying to 
save one extra indirection) has cost in systems building over the years by making 
dynamic reconfiguring difficult to intractable. The HW manufacturers have not helped 
by making interprocess switching and communication so slow — and so few. 


C could be better, but since it doesn’t have to be coded in directly but can be a target 
for code generators of many kinds, we can see that the big problems in and around 
C-like schemes for programming and OS building are those of “linking and loading 
and binding”. 


So could we imagine an Internet where each hardware has a very simple SW kernel to 
allocate its resources, and the rest of what's needed for any application — processes 
— can be dynamically loaded (and cached when this is a good idea)? 


The basic meta idea here is that the more static are dependencies, the more difficult it 
is to reformulate a system as both resources and requirements change. And (to take 
one of Reed's points) the more that the definition of “time” is taken from the CPUs 
and simulated by the systems themselves, the more coherently scaling can be done. 


This is not an extensive enough answer, so | apologize in advance. I'll probably return 
to it a few times if | can think of better ways to help visualize the actual problems. 


€ Alan Kay 

“7 Have designed a few programming languages - 5y 

Why is functional programming seen as the opposite of OOP rather than an 
addition to it? 


| hope for all our sakes that | can make this short ... 


In the latter part of the 50s John McCarthy got more and more interested in what he 
started to call “Artificial Intelligence”. He was also doing some consulting and this 
brought him in contact with the SAGE air defense system: large systems of very large 
computers attached to radar stations and each other and usable by graphical display 
systems with pointing devices. 


re 


John's reaction was “Every home in America will have one of these”. He could see that 
the networked computers could be thought of as an “Information Utility” (as a parallel 
to the existing utilities for electricity, water, gas, etc...) and that the terminals in the 
homes could provide many kinds of “information services". Among other things, this 
got him to advocate that MIT etc do “time-sharing” of their large mainframes ... 


He also realized that the computer milieu of the 50s — machine code and the new 
Fortran — did not intersect well with “most people in US homes”. This got him to write 
a paper in 1958 — “Programs With Common Sense” — and to suggest that what was 
needed for the user interface was an active semi-intelligent agent — the “Advice 
Taker” — that could interact with users in their commonsense terms, could learn from 
“taking advice”, could problem solve on behalf of the user and itself, and so forth (MIT 
Al Memo 17). 


This got him thinking about how to implement such an Advice Taker, whose main 
mechanisms would be various kinds of logical deductions including those that 


required actions. There wasn’t much to go on back then but a few gestures at “list 
processing”, so he decided to invent a language that could be used to make the 
Advice Taker (and other kinds of robots), and more generally allow symbolic 
computation to take its place alongside the existing numerical computation. 


John was an excellent mathematician and logician, and so he also wanted to come up 
with “A Mathematical Theory of Computation” to put ideas old and new on a firmer 
basis. 


His result was LISP (for “LISt Processing"). | have written elsewhere about its 
significance. 


Meanwhile, he was pondering just what kind of logic, math, and programming (he 
thought of these as highly intertwined) could be used to deal with a robot in the real 
world. 


<eliminating detail here> A conflict was between at (robot, philadelphia) and at 
(robot, new york) which could not happen simultaneously, but could happen “over 
time”. This was like the problem of contemporary programming where variables would 
be overridden (and sometimes even files) — basically, letting the CPU of the computer 
determine “time”. 


This destructive processing both allows race conditions and also makes reasoning 
difficult. John started thinking about modal logics, but then realized that simply 
keeping histories of changes and indexing them with a “pseudo-time” when a “fact” 
was asserted to hold, could allow functional and logical reasoning and processing. He 
termed “situations” all the “facts” that held at a particular time — a kind of a “layer” 
that cuts through the world lines of the histories. cf McCarthy “Situations, Actions, and 
Causal Laws" Stanford, 1963 prompted by Marvin Minsky for “Symbolic Information 
Processing”. 


One of the ways of looking at this scheme is that “logical time” was simply to be 
included in the simulations, and that “CPU time” would not figure into any 
computation. 


<more detail excluded here> This idea did not die, but it didn't make it into the 
standard computing fads of that day, or even today. The dominant fad was to let the 
CPU run wild and try to protect with semaphores, etc. (These have the problem of 
system lockup, etc., but this weak style still is dominant.) 


Systems that have used part or all of John’s insight include Strachey’s CPL, Lucid, 
Simula, etc. Look at Dave Jefferson's TimeWarp schemes, Reed's NetOS, Lamport's 
Paxos, the Croquet system, etc. 


To just pick just one of these, Strachey in the early 60s realized that tail recursion in 
Lisp was tantamount to “a loop with single simultaneous ‘functional assignment’ ". 
And that writing it this way would be much clearer by bringing the computation of the 


*next* values for the variables together. 


There are no race conditions possible because the right hand side of the assignments 

are all computed using old values of the variables, and the assignment itself is done to 
furnish new values for the variables all at once. (Looping and assignment can be clean 
if separate “time zones” are maintained, etc.) 


More main stream is that big data systems used *versions* instead of overwriting, and 
“atomic transactions” to avoid race conditions. 


Back to McCarthy and — now — objects. One of the things we realized at Parc was 
that it would be a very good idea to implement as much of John's “situations” and 


“fluents” as possible, even if the histories were not kept very long. 


For example, this would allow “real objects” to be world-lines of their stable states and 
they could get to their next stable state in a completely functional manner. In the 
Strachey sense, they would be “viewing themselves” with no race conditions to get 
their next version. 


This would also be good for the multiple viewing we were starting to use. You really 
only want views to be allowed on stable objects (/relationships) and this can be done 
by restricting viewing to already computed “situational layers”. 


Parc was also experimenting with "UNDO" and the larger community was starting to 
look at “parallel possible worlds reasoning”. 


The acts of programming itself also wanted to be in terms of "histories and versions” 
and systems should be able to be rolled back to previous versions (including “values”, 
not just code). cf Interlisp, and especially the PIE system (done in Smalltalk by 
Goldstein and Bobrow). 


This was another motivation for “deep John" in future systems. l.e. do everything in 
terms of world-lines and “simulated time”. A recent paper by Alex Warth shows some 
ways that “Worlds” can be quite fine-grained. http://www.vpri.org 
/pdf/tr2011001_final_worlds.pdf @ 


The last point here is that “Histories R US”. l.e. we need *both* progression in time for 


most of our ideas and rememberings *and* we also want to reason clearly about how 
every detail was arrived at (and to advance the system). 


John McCarthy showed us how to do this 60 years ago this year and wrote it down for 
everyone to read and understand. 


So: both OOP and functional computation can be completely compatible (and should 
be!). There is no reason to munge state in objects, and there is no reason to invent 
“monads" in FP. We just have to realize that “computers are simulators” and figure out 
what to simulate. 


| will be giving a talk on these ideas in July in Amsterdam (at the “CurryOn" 
conference). 


é Alan Kay 
“7 Still trying to learn how to think better  5y 


Why is engineering one of the hardest fields to be creative? What should 
engineering education look like instead? 


| don't think that it is intrinsically hard to be creative in engineering. 


An important part of engineering is that “the bridge must not fall”, “people must not 
get hurt", etc. So people “called"* to engineering always have this at the back of their 
minds. 


| used to utter the quip "Don't ever let scientists and mathematicians design your 
airplane. (Because they are even happier when it fails! — they might learn 
something!” 


These considerations cast a conservative tinge over most engineering. However, 
because of the great advances in CAD<->SIM, it is now possible for considerably 
wilder safe experimentation to be done with design ideas. The deep concerns about 
safety are embedded into the fidelity and comprehensiveness of the simulations, and 
this allows the designers to be “more creative”. 


* it really is a “calling”, as to medicine, science, religion, etc. 


€ Alan Kay 

‘1 Still trying to learn how to think better - 5y 

Do you recommend receiving a bachelors, masters or doctorate in 
computer science? 

Originally Answered: Alan Kay, do you recommend receiving a bachelors, masters or doctorate in 
computer science? 

| recommend the lifelong process of becoming ever more educated, with particular 
attention to getting above important thresholds of perspective and knowledge. 


Because we can't take “education pills", we have to work to get our brain/minds to 
learn and understand. This means that “becoming educated” is centrally autodidactic. 
Most highly educated autodidacts | know used universities as part of their process. 
This is because universities can be extremely fruitful places to discover the existence of 
many ideas, and to encounter many perspectives — universities are great 
complementary processes to individual reading and exploring. They also can help 
greatly in giving a sense of the thresholds that need to be reached for various stages 
of “real education”. 


1. Computing is new enough to still be tricky, it is now lucrative enough to entice, and 
is seen as enough of a need by the commercial sector for it to heavily influence what 
universities (and hence what high schools) present as “computing”. 


Just taking the first of these three points, | have always recommended that students 
*not* major in computing while undergrads. Both the newness and the “mostly 
making things” tendencies argue for a lot of context to be able to deal with ideas, 
fads, methods, novelty, “pop culture”, etc. It will help more to pick one or two majors 
in “really hard things” — I'd put “real science” first (which will require “real math”), and 
then “real engineering” (mechanical, civil, electrical, bio, etc.). 


Under this kind of plan, there could be a terrific “serious minor" in various aspects of 
computing and design. The “real major” subjects both provide analogies *and* great 
BS detectors for the fashions in a design centered field. 


If we look at the other two points from paragraph 1 above, then “enticing for jobs” 
and commercial pull is really going to muddy the whole idea of “education” because it 
tempts every faction into “vocational training” instead of “life educating”. (I happen to 
like “training” but it is a very different ballgame, and some self-discipline has to be 
exerted to make both work.) 


These problems have been going on for enough decades to have seriously 
compromised the actual reasons for universities to the point where many academics 
can't see the level of pernicious problems that are close to wiping out fields that 


overlap with money making. 


Two questions you (all) should ask deeply and seriously are “What if the word 
“science” was really taken seriously in “computer science”?” and “What if the word 
“engineering” was really taken seriously in “software engineering"? Careful answers to 
these will reveal the extent of the "Emperor's New Clothes”. 


2. To me, grad school is primarily about being around “interesting influences” — could 
be people, projects, things, etc. This is to some extent true for undergrad, but grad 
school is all about these. Unfortunately, many of the same complaints about 
computing in universities also apply even to today’s best grad schools, including even 
what they decide to teach and research. 


(Recently | was at one of these absolutely best places, found they needed to make 
advances in “strong Al" (aka “real Al") and asked why don't you just do it? They said 
(astoundingly) that their sponsors wouldn't fund it. | said but what about funding from 
the university? They said that doesn’t happen any more... This is not just outrageous, it 
is enormously dangerous.) 


There's more that could be said about the problems of doing “real research” anywhere 
these days, but let's leave it for now. The main point to me is that (a) you still need to 
go where the interesting things are (b) you need to do a lot of work to understand 
good meanings for “interesting” *before* you start looking around, and (c) you could 
have a lot of difficulty doing “real research” even in a good school these days. (One 
definition is “You are only doing research when you can still change your mind’ ...) 


When Ivan Sutherland was asked “What is a thesis?” he said “Something 3 people will 
sign”. So, to me, the most important act in grad school is choosing your eventual 
signers. They have to let you do what you need to do, but they also have to help keep 
you from being trivial and facile. 


And, yes, you should always finish your degrees. You can think of many reasons why. A 
deep one is that it is very good thing to learn how to work on extended hard projects, 
how to hang in there, and how to finally “stop doing real research” enough to not 
change your mind any more to get something done. The combination of multiple 
perspectives and being able to do something with them is not a bad way to think 
about "getting educated”. 


€ Alan Kay 
#1 Have designed a few programming languages - 5y 


What do you think of the statement that "an object-oriented approach to 
application development makes programs more intuitive to design, faster 
to develop, more amenable to modification, and easier to understand"? 
Originally Answered: What do you think of the argument that “an object-oriented approach to 
application development makes programs more intuitive to design, faster to develop, more 
amenable to modification, and easier to understand"? 

An all encompassing answer from before objects: “You can write COBOL in any 
language”. (For COBOL, you can substitute Fortran, machine code, spaghetti, etc.) 


| quite agree with the other (3 at this point) answers. 


This is because clean and elegant software development depends on clean and 
elegant designers and builders. (In the early 60s, there were some quite beautiful apps 
done in macro-assemblers.) 


“Clean-and-elegant” usually leads to thoughts about more suitable tools and general 
outcomes, and these intertwine with dealing with some of the problems of designing 
anything. My 1966 insight that since computers were universal, you could use 
computers sending and receiving messages as semantic building blocks for 
everything, was my start on thinking about systems organizations and methodology 
that might help programming. The encapsulation would prevent interiors from being 
compromised or discovered, and the computerness would allow the interiors of 
objects to also be decomposed into networks of computers. 


This was initially a set of “math ideas”, and there were definite software engineering 
problems to making such ideas practical. This took some years to address by a 
number of us. 


The wonderful “Whole Earth Catalog” appeared while these ideas were being 
pondered, the editor, former biologist Stewart Brand, was particularly interested in 
“Whole Systems”, and the catalog listed many of the most important books about 
systems, design, and systems design. 


| wish | could rhapsodize about all of the best ones here, but for our current purpose 
let me single out Christopher Alexander's “Notes on a Synthesis of Form” (his 1964 
PhD thesis* (architecture) from Harvard just a few years before, turned into a book, 
now a classic). 


Alexander got interested in dealing with the complexity and mutual interference of 
myriad “requirements” in large scale design (e.g. such as a whole new village in India). 
This led to a search for clustering approaches that would reduce the overall spaghetti 


problems of a large system. He used the Socratic/Platonic dictum that "You should 
divide Nature at her joints as a good carver would” as a mantra, and tried to come up 
with ways that could identify the joints (including a Fortran program he wrote that 
could look for clusterings that would have minimal communications outwards). 


If there was a module tool, then good clusterings could be put inside modules 
recursively, and the result would be a set of modules within modules with minimum 
external dependencies. This would induce “simplicity” along several important axes, 
etc. 


We can see that this isn't quite the same as having the modules wind up being 
“natural” and “intuitive”, but the “Nature at her joints” idea will guarantee some 
coherence with intuition. 


These kind of design approaches were critical for the Xerox Parc work because the 
hardware we could build was so tiny relative to what we wanted the software to be 
able to do. In practice we kept doing the software — lots of inventions needed for 
personal computing — and every two years doing a new development system and 
language — the software would suggest better abstractions in which it could be 
represented. The practical progress made here was primarily because we had a real 
genius — Dan Ingalls — who could deal with enough of my ideas for abstractions 
*and* could ignore them when needed, but could make complete practical systems — 
many quite beautiful — and do them again and again. 


His results (with the help of the whole Learning Research Group) were good exemplars 
as answers to your question. Rephrasing: I'd say that the better you get at design, the 
more late-bound versions of what you mention above will help you to achieve your 
goals.** 


* Alexander later repudiated this approach, as he had changed his mind about 
architectural design methods, but the book is quite a gem of thoughts and ways to 
approach complexity. 


** That being said, | think one of the biggest problems yet today — we left large parts 
unanswered — are the many kinds of dependencies — that pollute attempts at good 
systems design. | think that some of the good ways to deal with these will still involve 
modules, but a rather different approach to systems organization. 


€ Alan Kay 
- 1 Had something to do with "Object-Oriented Programming" : 5y 


What is the difference between Alan Kay's definition of OOP and Carl 
Hewitt's Actor Model? 


Not a lot of difference. You can find the history | wrote about all this online in “The 
Early History Of Smalltalk”. 


Basically, | had the “dynamic object as whole computer” idea in Nov 66, highly 
influenced and catalyzed by Sketchpad and Simula (especially the former), the initial 
designs for the ARPAnet, current ideas about processes as virtual machines in time- 
sharing and multiprocessing systems, and analogies from my math and biology 
backgrounds. 


could see that the extensible language ideas of Irons could also be a way to receive 
messages — and extend the syntax — in a system of dynamic objects. | poked at this 
in my thesis. 


met Seymour Papert in 1968 and this changed my outlook on personal computing, 
led to the Dynabook idea, and got me to start thinking about what children should be 
doing to learn “powerful ideas", and especially with the medium of computing. 


realized | should finish my thesis, and put off the Dynabook for the next project. 


But | found Carl Hewitt’s PLANNER language at MIT — a very important precursor 
(and superset) to Prolog. This again almost did me in because it was clear that it 
would be wonderful to program “kind of like this”. 


At Parc, with thesis work in the past, | started to think about a children’s language in 
which they could do Logo-like things, Object-like things, and also PLANNER-like 
things; for example monkey-and-banana and block stacking kinds of problems. 


This led to the design of Smalltalk-71, which got interrupted by “a bet, and a hack” 
that immediately produced Smalltalk-72, an object language described in one-page, 
McCarthy-style, whose objects were kind of like closures that could parse their inputs. 
| gave a talk a MIT in the Fall of 72 to share these ideas, and the first Actors paper 
adapting some of these ideas appeared the next year. 


Subsequently, the Actors work remained more true to the original object ideas than 
our Smalltalk work at Parc. This is partly because our aim was to “invent practical 
personal computing on a personal computer” and to use the object oriented 
languages for this purpose, we did a major rewrite every two years, with a genre 
change every 4 years, to do what was necessary to get a real system with UI and lots 


of capabilities going and deployed. 


A few years ago we brought back to life — from a rescued disk pack that Xerox had 
trashed — the Smalltalk we did in 1978 for the portable Notetaker computer, and | 
then used it to do a tribute for Ted Nelson &%. Worth peeking at. This system was the 
one that Steve Jobs saw the next year in his famous visit in 1979. 


Actors research produced some very important results. (And of course, the Parc work 
had a large impact generally.) But the next steps for PLANNER like languages, and 
especially for combining PLANNER ideas with object ideas, have had less lift and 
success. These are still good ideas, and people should start thinking about them again. 


G Alan Kay 
<< Still trying to learn how to think better - 5y 
When did the CAD->SIM->FAB engineering process become formalized? 


I'm not sure it has, but | think anyone really interested in engineering should find out 
how things were done before digital computers, and then before analog computers. 


There was still a lot of "D” that resulted in models, and those models could be 
tested/simulated in a variety of interesting ways. Take a look at bridges, ships, 
airplanes, buildings, etc. 


One of the trickiest things was to be able to deal with scaling in realistic ways. What 
does a wind tunnel really tell you from a model. What does a model ship really do in 
waves? And so forth. The “simulations” were much less accurate than can be done 
now, but the key ideas go back into antiquity. 


The "FAB" part has always been even more difficult, so testing on the artifact was 
generally only done when it was as easy to make as a model. 


€ Alan Kay 
‘1 Have designed a few programming languages - 5y 


Why is object-oriented programming more about messaging than objects? 


I'll give a short answer here and then return late for more. 


1. If you are sending messages to “something”, then what kinds of messages do 
you want to send? For example: are you commanding, are you requesting, 
are you suggesting, are you negotiating? 


2. If you can send messages to “something”, then you are not tightly bound to 
them forever, you can change your mind, and they can change also. 


3. If you can send messages, do you really want to try to send to specific 
targets? It could be better to send out requests by need, and also also tell 
the system that you can supply x, y, and z to the commonweal. 


€ Alan Kay 

‘7 Have designed a few programming languages : Updated 5y 

Is it possible to learn any programming language by doing reverse 
engineering? 

I'm not completely sure of your exact meaning for “reverse engineering”. But at the 
University of Utah ARPA project the newest grad student was given the toughest task 
no one there wanted to do. 


The one given to me when | arrived in 1966 was to “get Algol going on the university 
mainframe” (a batch processing Univac 1108). There were several large mag tapes, two 
thick listings of machine code, and a little documentation on the system (the latter 
was pretty much incomprehensible because it was written in Norwegian that had been 
transliterated into English (simple sentences worked but many terms came out in odd 
non-standard ways)). 


Norwegian, because this was actually the first Simula system, which was piggy backed 
on the Case-Western Reserve Algol done on the predecessor machine in 1965. 


Here skipping the preliminary forays, we eventually took the machine code listing on 
fan-fold paper and spread it along the 150+ foot long hallway of Merrill Engineering 
Building. Three of us grad students crawled* over this listing with colored pencils 
marking things and yelling out discoveries and questions. After a few days of this we 
had found most of the main sections, and especially that the storage allocator was not 
like that of Algol, but was allocating and recovering instances of Algol blocks in ways 
that allowed them to persist over time. We eventually realized that what the first 
Simula called an “Activity” was what Sketchpad called a “Master” and what the next 
version of Simula (and we) call a “Class” And that what Simula called a “process” was 
what Sketchpad (and we) call an “Instance”. 


That allowed us to read the spotty documentation and the “code crawls” also found 
the hooks to an OS (such as it was), and after a few tries (no debuggers on this 
machine) we got the system running, and | was able to get the Simula “hospital 
simulation” (a kind of job shop model) running. 


The big deal for me was to see that Simula was a “poor man’s Sketchpad” on the one 
hand (it didn't solve constraints automatically (or at all), but on the other because of 
the way you could program it, it would allow many things to be explored that 
Sketchpad wasn't able to handle. 


Most important that the two “almost-object” models in Sketchpad and Simula — 
thought about together — were the same abstraction as the virtual machine processes 
in a time-sharing system and the individual computers that were going to be on the 
ARPAnet (then in development). This led to a big light going on, and what | call 
“Object-Oriented Programming" (a very bad term, | should have called it “Agent Based 
Programming). 


*an interesting UI for code perusal! 


** It's worth pondering that Lisp is a kind of programmable meta-language, where the 
trade-offs are very different than most people are used to. It's kind of a universal 
material, so it rewards deep design thinking, but many people flounder because it 
requires more imagination than most languages. l.e. trying to reverse engineer Lisp is 
very illuminating if you look at how a Lisp is made, but just trying things will likely 
leave most programmers quite in the dar 


é Alan Kay 
<1 Still trying to learn how to think better  5y 
Are the greatest works of a field performed early on in the field's lifetime? 


Well, let's take Physics or Biology, etc., as examples, and let me ask you what do you 
think? 


This is a hallmark of a “real field”. It's not clear that Computing has gotten there yet — 
one clue is that “real fields” build on the great work in their past, and “pop culture 
pursuits” don’t (and generally do “a variety of things” but without much growth). | 
think of Computing as — not completely — having devolved to the latter, but others 
might disagree. 


€ Alan Kay 
* 1 Still trying to learn how to think better - 5y 


How can I see my design work with a critical eye? 


Mark Twain advised writers looking at their own writing: “If you see a phrase you 
particularly love, strike it out without hesitation!” 


It helps to “create with love" and “debug with disdain”. If you are critical too early in 
the process you will kill ideas before they have any chance to surface. | write my ideas 
down and put them away so | can have them but not get entranced by them. 


Scientists use other scientists to help them debug (in many ways, “Science” is the 
whole community that helps deal with fondly held ideas and noise). 


| was greatly aided in my 20s by gradually transitioning from deriving identity and 
self-worth etc from worrying about the goodness of my results to worrying about the 
quality of the effort | was putting in, and the quality of sustaining the effort on things 
that seemed to be important visions, issues, and problems. 


This takes a lot of the glitter out of the creation, and one can still be successful in the 
quality of effort process by being extremely critical about the designs. 


| would venture that most colleagues I've worked with over the years might say that 
I'm much too critical. | hope they would also think that | sustained efforts pretty well. 


G& Alan Kay 
“1 Still trying to learn how to think better - 5y 


What are the best resources for learning systems and networks according to 
Alan Kay? 

Originally Answered: Where Can We Learn Systems & Network Which Alan Kay Talks About? Any 
Book To Understand Cell & Neuron? 

Let's see if we pool efforts to come up with a good answer to this question — 
especially with more current sources than some that influenced me. 


tou 


| liked the discussion of systems and design in Christopher Alexander's "Notes on a 
Synthesis of Form” from 1964. Books by both Bertalanffy and Waddington were 
excellent. 


My favorite molecular biology book has been “The Molecular Biology of the Cell” by 
Alberts et al. "The Machinery of Life” by Goodsell has the most wonderful illustrations 
to help visualize what is going on. A good one on development (morphogenesis) is 
“Endless Forms Most Beautiful” by Carroll. A more recent one — perhaps to be read 
before Carroll's book is "Life Unfolding” by Davies. | found another recent book “Cell 
Biology by the Numbers” to be really helpful. 


I'm very partial to Minsky’'s “Society of Mind” for an approach to a systems design for 
“psychology”. 


For an example of large scale systems thinking in computing, try David Reed's 1978 
MIT Thesis on “NetOS", an operating system design for Internet sized systems. | got a 
lot of insights from various papers by Butler Lampson (see his website). 


There are many more, but let's see what others come up with here. 


€ Alan Kay 
<7 Lam the Alan Kay in question. - 5y 
What has happened to Alan Kay's View Points Research Institute website? 


Looks like a combination of an overzealous bot plus human error. Should be back in a 
day or so. 


€ Alan Kay 

<< Still trying to learn how to think better - 5y 

What are the main successes of the Al group at Xerox PARC during the 70s 
and first half of the 80s? How much influence did it have on other groups 
(like Smalltalk group) and vice versa? 

Right now | can’t produce the level of answer this question deserves. I'll have to do 
some homework and then come back to add and fix. 


There were many stellar "Strong Al” (have to use this because “Al” has now been 
colonized by smaller goals and scopes) and related researchers at Parc — including (I'll 
miss many important ones in this pass) Danny Bobrow, Marc Stefik, Terry Winograd, 
John Seeley Brown, Richard Burton, Brian Smith, Ira Goldstein, Kurt van Lehn, etc. | will 
include Warren Teitelman here because his MIT thesis was a deep Al thesis and he 
wound up becoming the main force behind Interlisp, with special contributions in 
various kinds of system reflection. 


There were also many researchers who made extended visits. 


| will punt on trying to list the major systems produced at this point, but: KRL, PIE, 
Steamer, etc. 


| can answer the last part of the question. Besides the intrinsic interest of “what lies 
behind intelligence?” and “what can and should be done with real intelligence?”, there 
are also deep and interesting questions about tools, for example languages to make 
languages, especially representation and inference languages, ontologies, etc. 
questions of what is actually needed to make “experts” and what is actually needed to 
make “experts” less “brittle” in the larger world of knowledge and thought? 


My view was that many of the Al researchers were working on the very problems that 
were needed to understand and create the next generation of programming 
languages and development systems for pretty much everything. For example, | was 
quite taken in the late 60s by Carl Hewitt’s "PLANNER" system (much more than just 
something that also happened before Prolog). | thought it could be combined with 
object ideas I'd been having to make a whole new way to think about programming 
and representing. 


This was my initial plan for Smalltalk. Fate intervened with something else. But these 
ideas reflected back to Carl Hewitt in his later “Actor” ideas, which are more like the 
early Smalltalk ideas, and which went beyond in a number of ways. 


To just give one more example, in the PIE system, Ira Goldstein and Danny Bobrow did 
a quite wonderful major permutation of Smalltalk to institute both “roles” (they called 
them “perspectives”) and “world-lines” that used some of McCarthy's “fluents” ideas to 
allow a much stronger control and modeling of time. Both of these were major 
advances, and should have led to much more (they wrote 4 papers that were 
published as a Parc “Blue Book"). 


I'll leave this here for now ... 


€ - Alan Kay 
<< Still trying to learn how to think better - 5y 


What advice would Alan Kay give a curious individual to improve their 
ability to think and learn? Is there a place to see his library — every book, 
person, and research he has studied? 

Originally Answered: What's Alan Kay Recommendation For Curious Individual To improve Ability To 
Think & Learn? Is There A Place Where | Can Get To See His Library & Every Book/People/ Research 
He Ever Studied? 

The people I've admired — many from afar (such as Bertrand Russell), some as friends 
and colleagues (such as Jerome Bruner) — share the traits of trying to (a) find ever 
more perspectives on things they are aware of, (b) find more things they weren't 
aware of, (c) interrelate what they've learned, and (d) make mental spaces for their 
own ideas where only the “perfume” of what they know lingers[*]. 


A useful ploy is to consider oneself quite blind, and to have the goal to find ways to 


see, however murkily. If the murkiness can be assessed to one degree or another, then 
some footholds can be established to help parts of the process get “above threshold”. 


It might be an oversimplification, but | think all of this has a lot in common with the 
kinds of processes needed to learn and play classical music, modern science and math, 
etc.,, meaning that both the “art” part and the “technique” part have to be developed 
beyond “certain thresholds” in order to be effective and more than just gestures. This 
can be excruciating sometimes — for example, | love music but require quite a bit 
more practicing than most musicians for comparable results. This was just the luck of 
the genetic draw. But the love helps the work and the art, and | think this is an 
important general key to making progress. 


A good slogan is “Technique should be the Servant of the Art, not the Master”. This is 
tricky to manage because technique is really needed; it is easier to advance; it can 
impress others to the point of misleading both them and the artist; and it often makes 
simplicity difficult to grasp and achieve. 


Despite being a lifelong autodidact and voracious reader who had a difficult time 
being at all patient in school, | found that dealing with official school processes — 
most especially in college and grad school — to be really important. | needed that set 
of external ideas, standards, assessments, even when they weren't very good — in the 
end they helped. Most autodidacts | know who try to do this on their own rarely learn 
enough from enough perspectives to be “vital thinkers”. | also found that plugging 
away on stuff that was very difficult to learn really helped — even just the “plugging 
away” part, and just the idea that one’s own nose is good, but not sufficient. 


Frank Oppenheimer — the creator of the San Francisco Exploratorium, a physicist, and 
the brother of Robert — once told me that when they first set it up he got many 
complaints from some of the sponsors and donors. They wanted and expected “a 
science museum” and they complained that what he’d made was a chaos of 2000 
children running around and breaking things! 


He said to them “You don't understand. The door to Science is to realize that ‘the world 
is not as it seems’: We've made 500 different hands-on exhibits that show this, and we 
figure that 2000 kids bumping around 500 exhibits will produce matchups so that a 
particular kid and a particular exhibit will trigger off the deep hit that ‘the world is not 
as it seems’. Many children will never be the same after this, and they are ready to step 
through the door into Science." 


This parable works in the large. A particular “great book” may not match up to a 
particular curious learner. | advocate reading lots of books, and to get fluent enough 
in reading to also read books that don't match up, or sometimes are even mediocre. 
Some of this overhead will pay off later as one changes and grows. A good library with 
a few thousand books is a kind of “exploratorium” for readers. 


A university is another kind of exploratorium if one has learned that universities have a 
dual role (a) to provide many perspectives and examples of knowledge that one 
doesn't have, and (b) in parallel to be used by the learner to pursue interests. These 
quite different routes both need to be taken. The former is too random by itself, the 
latter is too inward and ignorant by itself. 


Einstein said that “Love is a better teacher than Duty”. If the love comes from curiosity 
in the large then enough duty will follow to keep the processes from becoming too 
narcissistic and narrow. 


[*] Footnote: the (d) part above was not clear to one commenter. We have limited 
mental resources — called “cognitive load” — and this means we are easily saturated 
before we have enough resources on the one hand, or enough capacity left to think 
our own thoughts on the other. The thoughts will generally be trivial if we don’t know 
much, but will be only the thoughts of others if we don't make space for ourselves. 


In particular, we are quite overwhelmed by the clamor and detail of what is thought of 
as “normal” and “reality” in the present. We have to find a quieter place in which to 
think (and where “reality” can be safely considered as a human construct, and often a 
social one). 


I've advocated “learn everything and then forget it except for the perfume”. This can 
create a mental space for thinking which will inescapably be helped by what we know 
— it's really hard to completely forget! — but in which what we know (mostly 
meaning what we believe!) is far enough away to allow us to feel things, listen to our 
subconscious whispers, and generally barge around. “Most ideas are mediocre down 
to bad" so we need to do something with ideas also — one strategy that works is 
instead of diving in, to note them down so they can also be put on the back burner for 
possible later use. 


When something more promising wafts up, it will include something that is like an 
“odor”. This odor will very likely be like some of the perfume hanging around, and the 
perfume can be followed to what you learned. If you've been able to suppress the 
present a bit, the odor will likely take you to parts of the past that didn’t seem relevant 
(perhaps because they didn’t contribute to the current “reality”). 


| think it’s obvious to every reader that if you only learn things that contribute to your 
near-term specific goals, you will very likely not wind up with any interesting versions 
of the “other pasts” that can help with new ideas. This is one of the most powerful 
reasons for general learning rather than just “problem solving learning”. The latter 
very often misses “problem finding” of what should actually be worked on. The result 
is that many are not just trapped in a weak present, but lack the perspective to even 
see that this is a large part of their problem. 


Our human context mechanisms — many of which correlate with language — want 
“crisp topics” — but the oldest senses, especially smell, provide superposition 
properties — spatial, temporal and communicative — that are extremely powerful for 
animals that are highly developed for this, and when thought of as a metaphor, for we 
humans when we are trying to deal with lots of things without getting overwhelmed. 


Alan Kay 
+ Still trying to learn how to think better - 5y 


How does Alan Kay identify talent in young researchers? 

The wonderful Dave Evans did not look at transcripts, but was interested in resumes 
and "sparks". He over-admitted, treated his students like gods, and then (I found out 
later after | was on the faculty), at around the two year point, the faculty would meet 
to decide if the student was a “real person” (meaning were they the real deal as a 
researcher). If the consensus was “no”, then the student would go out one door with a 
Masters degree; if “yes” they would get a PhD. 


ARPA was willing to pay for this, and (| think) many of the ARPA projects had 
somewhat similar processes. Dave was still a fairly extreme edge of this. 


Most of my career has not been directly in universities, but a variant of this idea was 
used at Parc, and especially for interns and visitors. This is a really nice way to get the 
kind of experience needed with a person before committing to the long term close 
relationships that research teams need. 


MIT had a terrific intern program with technical companies — called “6A” — where an 
intern would spend two summers and then a whole year, and wind up with a Masters 
at the end. The first summer would be “to see”. The second summer, the intern and 
the groups would have an idea of “match-up”, and the year would be a concerted 
project. 


I think most of my impulses for “interns and visitors” were “sparks”, and a lot of these 
resulted from talks, and sometimes classes, I'd give. 


So the short answer is: give them chances! 


€ Alan Kay 

=~ Still trying to learn how to think better - 5y 

Why did the federal government invest $117 million into the Radiation Lab, 
Servomechanism Lab, and the High Voltage Lab at MIT in the 1930s? 

I'm in London right now so don’t have access to my copy of Nilo Lindgren’s book 
about 100 years of EE at MIT. 


But | do recall that both the Rad Lab in Building 20 (eventually invented about half of 
all the radar systems used in WWII), and the Servo Lab (trainers in WWII, afterwards: 
Whirlwind computer, core memory invention, early interactive graphics, real-time 
simulations, etc) were not started until about 1940. 


All three of the labs had connections with Vannevar Bush, especially during WWII 
when he was Roosevelt's Science Advisor, but | recall that he was already active in DC 
in the 30s as well as continuing his deep connections with MIT. 


| think that the High Voltage Lab was started when they got Van de Graaf from the UK 
in the 30s to start making van de graaf HV generators — and | believe that Bush was 
also behind this. 


(Copying in my answer to Chris Lee’s comment) 
There are good books about: 


e the Rad Lab (more or less initially funded by Alfred Loomis around 1940 
anticipating that the US would eventually be in the war, and radar would be 
needed, and that the superior British technologies should be taken as the 
starting point) 


¢ the Whirlwind part of the Servo Lab (which has a little about the WWII role in 
the first part of the book). 


¢ There is also a pretty good biography of Bush available, which | think is 
indispensable to much of the technical history of MIT from the 20s on. 


€ Alan Kay 


Still trying to learn how to think better - 5y 


Where did Alan Kay learn about how context affects the ability to think? 

| was lucky enough to learn to read fluently several years before attending school, and 
by the time | did, had read enough to have encountered different points of view on 
some of the ideas that were taught as hard truths in first grade. When | piped up and 
said "But | read this in a book", | was not encouraged by the teacher, and | soon got 
the idea that school was not about “books” but about “the one book” — their book — 
for each subject. But, via reading, that was already too late for me — I'd gotten into 
the larger world of ideas. 


The first adult book | read all the way through was my father's copy of Edith 
Hamilton's “Mythology”, and the most important part of this book was the tacked on 
Norse myths after those of Greece. | realized these were essentially the same stories 

with different names for the characters, and that was illuminating — and also helped 


greatly in resisting attempts to teach me the local religion. 


Another book that my father had was Breasted’s “Ancient Times” about a wide variety 
of early civilizations, each with its own ways and beliefs. Reading many books, 
including lots of golden age science fiction in the 40s and 50s helped. 


Getting deeply into science as a young child helped. | loved rockets and jet planes 
around age 5 or 6 (ca 1945-6) and got a present around age 7 of Willi Ley’s "Rockets, 
Missiles, and Space Travel”. This had a lot of interesting stuff in it, but the most striking 
thing | can recall from this book was the discussion of Hohmann orbits: basically that a 
spaceship is also in orbit and can't just “drive” from one planet to another, but must 
follow an orbital path of its own to catch up with an also moving planet where it will 
be, not where it was when the voyage started. 


However, | don’t think | can pin down a good answer to Eddie’s exact question. | got 
interested in why people in crowds do some of the things they do — many of them 
horrible — this got me into Sociology, and then into Anthropology, which is about the 
contextualized thinking of human cultures (| wound up minoring in Anthro in college). 


Certainly, | can remember the big hit when | finally understood what McLuhan was 
driving at. | didn't understand “Understanding Media” when it first came out, but was 
motivated a few years later to spend an entire summer trying to understand McLuhan. 
A good route for me was through his earlier book “Gutenberg Galaxy”. This is a very 
large idea about “contexts” and that the fluent learning of a context is not just a 
simple layer of symbols, but what has to happen to our brain/mind to be fluent in 
most things. 


A book that helped around the same time was Koestler's “The Act of Creation”, which 
looks at contexts from a different and very useful perspective. 


Another way to look at insights like these is as a transition from things you already 
know to a different world, where you realize that you understood them, but you didn't 
*understand* them. That can lead to active searches for stronger more illuminating 
worlds. 


€A Alan Kay 

1 Still trying to learn how to think better - Sy 

What are examples of "Perfect" and "Better" in regards to Alan Kay's 
"Sweet Spot"? 

Mark Miller's answer nicely captures much of what | was trying to communicate in the 
aphorism “Better and Perfect are the enemies of ‘What is Actually Needed’”. 


The one thing | might add is that one of several useful ways to think about the 
“MacCready” sweet spot — what is actually needed — is as the lowest thing that is 
*qualitatively beyond* “better”. 


As Paul said when he started thinking about man-powered flight, which the best 
people in the world had failed at for more than 50 years despite several large prizes: 
“The problem is that we don’t understand the problem”. 


This got him to drastically change goals (I tell this story in a number of talks which 
have found their way to YouTube). 


He essentially told his crew to forget about man-powered flight but to create an air- 
frame that could withstand "10 crashes a day” — by being fixable with tape and wire. 


With this, in a few weeks they were able to make more test flights (and crashes) than 
everyone else in history over many decades, and got a much stronger sense of the 
problem. Once understood, this turned out to be relatively easy to solve, and in about 
6 months from the time Paul started thinking about doing man-powerful flight, they 
won the first of the prizes, and shortly thereafter, the big prize for a man-powered 
flight across the English Channel. 


When asked by British journalists why this was possible compared to everyone else, he 
said "They were trying to make an airplane, we were trying to do man-powered flight”. 


Another way to understand this idea is to notice that the learning process frequently 
really requires going away from what was thought to be the goal. This is hard for us 


primates! But if it is really new, it can’t be characterized in the existing context, and a 
new context has to be created to hold the new idea. 


I've told this story many times to try to help people understand why Parc and the 
ARPA-IPTO before it were so effective compared to most other efforts at hardware 
and software design. The key was that ARPA was very happy to fund “problem finding’ 
— not just the "problem solving” that wrongly-named “rational organizations” wanted. 
By the time Parc started, ARPA had done 8 years of "finding the good problems” and 
also creating the researchers who would go to Parc and complete the finding process 
with deep solutions. 


7 


Almost all the projects we chose at Parc were at that special sweet spot where they 
were just barely doable, but also above the qualitative threshold, so that just barely 
doing them opened whole new worlds. 


| should say something about “Perfect”. Picasso said “A work of art is never finished, 
just abandoned”, and this captures some of the snares of “Perfect”. Human beings 
tend to be a lot more pragmatic than idealistic, so “Better” is a bigger problem for 
most. But there are those — | will include myself here — who hear the siren’s song of 
perfection. 


tou 


For these, Picasso's “abandoning” is good to learn. One way to do it on large projects 
is to associate with colleagues whose strongest urges are to “complete”, and to just let 
them do this as they see fit, as one gets entranced by beautiful new unreachables. 


€ Alan Kay 


Have designed a few programming languages - 5y 


What does it mean for a language to have Smalltalk-inspired syntax? 


There are three main answers to this question because there were three somewhat 
different Smalltalks ('72, '74, then ‘76, ‘78, then ‘80). 


In the first version, each object was passed a reference to the message, and did the 
work itself to parse and decipher it (this idea was partially inspired by Ned Irons’ (non- 
object) extensible language IMP). This meant that the programmer was specifying the 
syntax of the language when defining a class — there was almost no base syntax. The 
result was a highly extensible language in each of syntax, semantics, and pragmatics. 
This was slow, but general, and did not guard against every problem (very Lisp-like in 
many ways, rather like an FEXPR for everything with more help in parsing). There were 
many possible ways to refine and improve this idea, but we got a lot out of it just from 
the very first simple implementation, which in part, was a response to a bet about how 
small a definition could define everything (see “The Early History of Smalltalk” for 
more details about how all this progressed.) 


When the Alto didn’t get superseded as planned, there was a move to optimize 
Smalltalk, and also clean up its self-definition. Dan Ingalls did a lot of the heavy lifting 
on this, and Smalltalk-76 — and later the tinier Smalltalk-78 for the portable 
Notetaker computer — were the results. These went to a fixed syntax scheme so code 
could be compiled ahead of time into byte codes (like Euler, and my thesis language 
Flex), and Ingalls chose a simple syntax scheme to reflect the most used extensions 
and conventions in Smalltalk-72 — much of which could be called “keyword” or 
prepositional. IMO, his choices were a great compromise, and resulted in a highly 
readable syntax (certainly my favorite of the later Smalltalks). You can see this 
language described in full in Dan's really great POPL '78 paper (which should be 
online). 


When Smalltalk ‘80 got worked on, the designers forgot a few things about human 
beings, and went for slightly simpler more uniform syntax, and left out a few ideas 
that could have helped readability (for example, “implicit self’ permits prefixing when 
that helps readability, especially for control structures like If, While, etc.). Since 
Smalltalk-80 was the release version from Xerox, this simple keyword oriented syntax 
became the standard (I think to the detriment of adoption of the language). There will 
undoubtedly be explanations of this syntax in other answers to this question. 


There have been a few further ideas along these lines since. For example, CodeWorks 
by Marc Lentzner was a very nice syntactic design for an OOP language, and noted 
that if you use keywords/prepositions, then you should be able to order the 
parameters as desired, and also be able to have defaults for omitted parameters. | 
made a suggestion while still at Parc that a good idea might be to have a simple 
grammar for making up messages (this would be a nice intermediate between a fixed 
syntax and the open syntax of Smalltalk '72). This was not taken up, but | think that 
something like this was done for the Grace language. 


€ Alan Kay 
1 Still trying to learn how to think better - 5y 


If you had to choose just one word for your gravestone, what would it be? 


Curious 


€ Alan Kay 


Have designed a few programming languages - 5y 


What will happen if one rewrites the entire Linux using Smalltalk? 


I'm going to translate this question into “What are good ways to rethink operating 
systems in our day and age?" 


We generally assume that we have computer(s) that not only have CPU(s), memories 
of various kinds, and connections to networks, but that incorporate some forms of 
hardware protection for their resources. The nature of the latter will affect the kinds 
and amounts of work needed to be done “at the bottom”. 


On an architecture like the Burroughs B5000 series, almost everything important for 
an OS is already in the hardware of the machine, including the first great protection 
scheme that would later be called “capabilities”. At Xerox Parc we made microcoded 
computers, where the microcode cycles were about 5 times faster than main memory 
could cycle, and which used 16 program counters to allow the microcycles to be 
interleaved with zero overhead task switching. This allowed us to quite easily replace 
what is usually thought of as an OS with the “real object” oriented language Smalltalk. 
Using an architecture like the B5000 in microcode, and with the ability to reflect what 
we learned back into the microcode, turned out to be very powerful and very simple 
to do. 


On any kind of machine, we want to allow programmers to safely and efficiently use 
any levels of the system they need to accomplish their goals. So, on most machines, 
the interplay between the deep kernel and the hardware will generally be tricky — 
most computers are not as nice as the B5000 or the Parc computers. For example, any 
decent CPU these days should have completely protected processes and not have to 
resort to unprotected “threads” ... but this is not generally the case, even though very 
little silicon is required to do this. 


One of the best ideas in the original Unix was to try to make the “protected mode” 
kernel as small as possible, and get everything else needed via standard user-mode 
processes. The 1000 lines of C “Linus kernel” quite rightly went back to that idea, and 
was a great attempt to deal with protection and allocation of all resources with no 
additional BS. 


(It's worth noting that Unix was done in the early 70s — on PDP-11s — as Parc was 
starting up with *very* different ideas about both hardware and software — although 
Doug McElroy’s way to think about Unix was along some of the same lines.) 


We now need to look at the nature of systems and their design to motivate what we'd 
like to wind up with. There are several good books from the dawn of “systems 
thinking” that have excellent insights (including those by Bertalanffy, and by 
Waddington). 


One of my favorites is Christopher Alexander's PhD thesis “Notes on a Synthesis of 
Form” (which has been continuously in print since 1964). The thesis is about design, 
and how constraints and dependencies make any real design problem difficult to sort 
out. Alexander uses Plato's quote “You should separate Nature at its joints as a good 
carver would” to point out that how you try to understand a system (and to make 
them) depends a lot on how many interconnections and dependencies you have to 
deal with, and these often can be reduced by judicious clustering into modules. (Part 
of his thesis was a Fortran program he wrote to try to find clusterings that had 
minimal outs-and-ins.) This is tough to do for architecture, and he moved on to “the 
timeless way of designing” after moving from Harvard to Berkeley in the 60s. 


But there is a lot of gold in this book for computer people to ponder (as indeed we 
did in the latter part of the 60s and then at Parc). If a system is thought of as “clusters” 
inter-connected with communications, then it is natural to think of the interior of a 
cluster as also a system. This gives a natural decomposition, and it also suggests that 
the clusters need whole computers to model them, and things like networks to allow 
them to communicate (we can see another source of ideas for “real objects” back 
then). It is a simple jump to use software "real-objects-that-are-whole-computers” as a 
universal building block. Some of Alexander's ideas about constraints, dependencies, 
interactions, etc. can be used to try to find clusters that “minimize difficulties”. 


Most of these ideas were around in one form or another, and the fun of Smalltalk was 
to see how far they could be taken in the early 70s on a very small but very nice 
computer. 


An important principle here is that there are techniques for not only making 
programming languages, but for making them fairly rapidly. If computer people don't 
know them, they will be tempted to use something that is around rather than to try to 
think things through again in the current context. (This is important because as Butler 
Lampson like to point out, Moore's Law means that a lot of your basic design rules 
actually change every few years, so you have to be very careful about what you think 
you know (it could have gotten obsolete right under your nose).) 


So | would start from *today-into-a-10-year-future* and think things through again 
from scratch. 


Certainly, it would be crazy not to think of “the computer” as the whole Internet. A 
start on this would be to look at Dave Reed's 1978 MIT PhD thesis on a design for a 
“whole Internet” OS. (And you can look at “Croquet” and its open source version 
“Open Cobalt” to see implementations of some of these ideas.) 


To keep this from getting even longer, I'll just pick a few things out of more. 


I'd most definitely try to materially advance the state of the art in “dependencies 
management" along every dimension. Even with great clusterings to make things 
more understandable, there are still many dependency problems. I'd make a CYC-like 
“expert system about systems” to help here. Scaling means that names get less and 
less useful, and active negotiations between modules using descriptions and safe 
testing get more and more important (the expert system would help here also). | 
would most definitely incorporate various kinds of “pseudotime” (McCarthy, Reed, 
Jefferson, Lamport, etc) to eliminate most race conditions on objects’ knowledge 
(especially distributed replicated objects’ states and processes) and allow “world-lines” 
for “capturing time”. 


Back to the beginning here. The tendency has been for some “operating systems” to 
start off clean and minimal — like a good belief system — but then the acolytes start 
to worship it and start adding junk doctrine, especially ways to avoid doing needed 
work. (Some operating systems started off like a pop culture and never were clean and 
minimal!) 


It should be easy to see that in this day and age that computations need to be in 
terms of systems (as described above) and that they need to be automatically mapped 
into various physical computers in a variety of ways. This means that the job of the 
resident software on a piece of hardware is to do almost nothing except help with 
protection and allocation of time and space and other resources. It shouldn't have to 
know about the computations it is running (that's part of what protection is all about), 
and even more important, the computations had better not have to know in the 1st 
order about where they are running. As far as their resources are concerned, they 
should be in terms of more modules, which can be delivered to a physical hardware if 
necessary. Part of the space allocations by the tiny kernel could help with caching, etc. 


The idea behind the Internet (the most successful “real object” system in the world of 

computing) is that the most important agreements should not be in terms of what's in 
the nodes but in terms of how the intercommunication between the nodes should be 

done. It's worth trying to base future thinking on enlargements of that idea. From this 
standpoint, we can see that the “interesting language” is the intermodule one. 


We could have done this better when doing Smalltalk, but it was still an interesting 

first pass. Once done, Smalltalk shows how to use and intermodule language as a way 
of doing things recursively “all the way down” for making systems. But it doesn’t at all 
preclude having any other kind of language or mechanisms inside an object. That is a 
large part of the reason for having encapsulated “clusters” be thought of as “objects”. 


€ Alan Kay 


‘| was the main designer of the Parc GUI - 5y 


AD 


What exactly is WYSIWYG? 
What | am doing right now is *not* WYSIWYG. 


For no good reason, and several bad ones, the Quora text input system (and most 
others in web browsers) make you type into a separate area that is decoupled from 
where it will wind up and what it will look like. Quora does do a bit of decoration (vs 
e.g. Hacker News) but it is still not what | want to use (and it is not what we first made 
work because at Xerox Parc we wanted to work in “terms of our results”). This also 
included what is now called “live programming” (at Parc it was just “programming’). 


Contrary to some of the other answers, it was not primarily about printing, but about 
working in real-time in terms of your results. One of the metaphorical questions we 
asked was “Imagine if you played a flute and had to wait seconds or minutes to hear 
and make better”. This is part of another Parc driving principle: "No Modes". What's 
interesting is that one of the modes we got rid of (“insert mode” vs “replace mode") 
for text has been retained. But non-WYSIWYG interactions impose an annoying and 
not needed mode between authoring and experiencing. 


A little aside to give our definition of “modeless”. It recognized that something like a 
mode — a window showing something specific — can be very useful, but what is 
needed is to be able to get to the next thing you want to do without having to explicitly 
get out of what you are doing. For example, a window with something in it is a quasi- 
mode, but because you can start any other action directly (e.g. clicking on anything) 
without having to say “get out of this window”, we would call it “modeless”. In text 
editing you don’t want to have to get out of “inserting” in order to “replace”. In 
programming, you don't want to go through a edit in an editor, submit to a compiler, 
which submits to a loader, which requires your system to intiialize, and so forth. You 
just want to be able to deal directly and safely with what you are trying to achieve. Etc. 
Since people make mistakes and also want to explore options, you also need a 
comprehensive UNDO. 


Another aside: note just how really bad are the Uls on so-called “smart phones" (they 
are brain dead). 


So “What You See Is What You Get" is a very big and very important part of personal 
computing, which includes the idea of “symmetric ‘reading’ and ‘writing’ ", For no 
good reasons, all this was thrown away when the web and the web browsers were 
done. Ironically, the web was done on machines that already had good WYSIWYG 
applications that had been around for a decade or two. 


The large idea was an ARPA-IPTO idea. At Parc, when we decided to sacrifice a lot of 
our precious RAM to do a bit-mapped display, we realized that by going to this omni- 
display we could get really good fidelity, and results-mode interactions if we were 
willing to do the work to make it happen. (| wish more computer people could take 
the gift of personal computing more seriously and not find reasons to cripple the 
ideas.) 


In my recollection, the term was made up at Parc by Charles Simonyi when he was 
Butler Lampson’s grad student and did what became Microsoft Word (called Bravo 
there) as the example for his PhD thesis on “intentional programming” ca 1974-75. 


€ Alan Kay 
‘<1 Still trying to learn how to think better - 5y 


Could a modern-day Xerox PARC be funded through Patreon? 


There's coming up with the funding, coming up with the visions, and coming up with 
the people. 


Butler Lampson estimated that the number of computer researchers at Parc during its 
most productive early years was “about two dozen”. |.e. “tiny”. | think there were about 
40 or so in its second five years (it's total span was about 11-12 years depending on 
how you count). 


The biggest difference monetarily between then and now is that the house prices in 
Palo Alto increased about 10 times faster than inflation. In 1971, the modest salaries at 
Parc allowed one to buy a house, whereas the comparable salaries today would not. 
On the other hand the amount spent per year per researcher for computer resources 
(most of which we had to make ourselves) was enormous. 


| mention this because it is more complicated to translate the estimated (small) 
budget for Parc back then to today. A simple translation for the computer part would 
be roughly $10M/year today, but more compensation would have to be added, let's 
say to about around $12M-$15M/year. 


Parc made a lot of progress by being willing to build most of its HW and SW from 
scratch. One of the large parts of that process was to make various kinds of hardware 
accelerators for software, including programming languages, to try to limit the need 
for optimization in the software. That would still be necessary today, but the HW side 
especially would be quite different. Some of how we used fast discrete components to 
fashion microcoded processors could be done in FPGAs today. But | think that the 
ability to quickly do state of the art custom chips would be needed. 


Whatever the final budget might come to, it would still not be large. | think it could be 
too large pragmatically for Patreon to muster, but not impossible. The need would be 
for at least a 5 year commitment on the part of the funders. 


Parc had a lofty and large vision about universal personal computing and 
internetworking for the improvement of civilization that it brought from its roots in 
the ARPA-IPTO computing research community of the 60s. Graduate students within 
ARPA naturally thought in these terms and scales, including the idea that most things 
would have be built from scratch. 


| think both a large enough vision and the special people would be the hardest to 
come up with these days. 


It is certainly hard to locate visions that are anything like the ARPA/Parc ones in scope 
and romance. There are a few scattered wonderful projects* but not a community. 


It's worth contemplating this dismal state in today’s world of tiny increments to what 
people might want to buy rather than what they actually need. And especially the 
oddity that 40-50 years ago when it was really difficult to make computers and do 
things with them, people did quite wonderful things — whereas today the computing 
resources are abundant, but most projects are pretty trivial. Most that still use old 
grand terms — e.g. like “Al” — have discarded what's important and difficult about 
what the terms used to mean — in order to concentrate on the easier to do subsets. 


© eg, take a look at what Bret Victor is now doing ... 


As for the people, it’s certainly the case that there are numerically many times more 
computing people around with the "IQ's (if | can use that term for a certain kind of 
mental power rating) of the Parc researchers. 


On the dimension of “creativity” there are enough super-creative people to draw from. 


But perhaps the most unusual aspect of the ARPA/Parc culture was that it was 
strikingly more cooperative and more able to get synergy from colleagues than my 
perceptions of today. This is too large a subject for this note. Getting that synergy to 
happen was a deliberate part of the ARPA/Parc process, and to pick just one name — 
Bob Taylor — there were real efforts to make it happen. (I need to mention that this 
was far from perfect — but it was quite a bit better than processes I'm aware of 
today.) 


Trying to sum up: when | think about this stuff in relation to questions about Parc, | 
keep coming back to the enormous and | think necessary influence of the ARPA 
community on Parc, from the standpoints of perspective, desires, vision, methods, 
people, and colleagues to continue working with after the fact. 


In several lectures | gave recently at Stanford, | tried to sum up by saying “the 
goodness of the results correlated most strongly with the goodness of the funders”. 


é Alan Kay 
1 Have designed a few programming languages : 6y 
Why is it possible to use an object as a parameter directly? 


In Smalltalk, everything is an object and it does pass parameters, so ... 


As to “why?”, it's worth thinking about what a data structure “knows”, what the 
receiver of a data structure needs to know, what an object knows, and what the 
receiver of an object needs to know. 


If you can see that the answers are: not a lot, a lot, and a lot and not a lot, then you 
have answered your question. 


You can start to see why using objects to simulate data structures is not a great idea at 
best, and gets worse and worse as your system gets richer and larger. 


€ Alan Kay 
\ 1 Have designed a few programming languages : 6y 


What resources does Alan Kay recommend for learning real object oriented 
programming? 


Following on to Mark Miller's answer 


One resource for morphogenesis would be “Endless Forms Most Beautiful” by Sean 
Carroll. 


We get children started on systems by having them make systems with lots of parts — 
like epidemics, the foxes and the rabbits, “ants”, ecologies, etc. It is pretty easy to 
model some systems things like “Ethernet” in a child's language such as Etoys. 


The basic idea here is that “simple programs are too simple to learn about 
programming, design, systems, etc.” On the other hand, complex programs are too 
hard to do when you are a beginner. 


A sweet spot is massively large systems with relatively simple interactions. A number 
of good ones are found in Mitchel Resnick's book “Turtles, Termites, and Traffic Jams” 
and introduces StarLogo. 


Many of these are SIMD, so they don't have enough interactions to get beyond “data”. 


What's needed are examples where a simple “data-centric” approach is too fragile ( 
need to get beyond SIMD). 


A good “real oop" language will allow: 


¢ children’s examples of this kind to be done very straightforwardly 


¢ simpler examples to easily be done staying with “real oop” rather then 
degenerating into simple data and procedure programming 


e rather complex systems — like operating systems, simulations of complex 
engineering, scientific, and biological systems to be done without adding 
unnecessary complexity 


€ Alan Kay 
A Have designed a few programming languages : 6y 
What does Alan Kay think of the Swift programming language? 


| don't think my opinion matters. But it’s worth looking at the stated goals for the 
particular design and implementation of any programming language and (a) see how 
well they match up to the system in use, and (b) what important goals have been left 
out. 


For example, for C, (a) it wasn't too bad for its original goals, and (b) almost nothing 
that has been written in it — especially by humans — in the last 25-30 years or so 
should have been. This is not a slam on C but on the attempts by humans to use a 
tool of particular range and use in areas — like large scale software systems — where 


it is not at all suited. 


Most languages do more or less well at (a) but not so well at (b) and usually get 
increasingly worse at (b) over time. 


So — for Swift — | think we could almost ignore (a) and look to see how it does for 
(b). 


What do you think? 


Alan Kay 
Have designed a few programming languages - 6y 


If Tony Hoare said that “debugging is harder than programming and you 
shouldn't use all of your cleverness to write the program” — or make 
anything — then what role does planning have in engineering and software 
design? 

The famous Tacoma Narrows bridge that vibrated itself to bits in a strong wind was 
planned, but the means to do a good simulation did not exist when it was built. Today, 
we would expect that a bridge we built would anticipate most stresses. The old joke 
was that “But test pilots still wear parachutes!”. With today’s planes, a parachute might 
not help at all, so more care needs to be taken even when testing (and better 
simulators help every one and every thing). 


It's hard to tinker a great sculpture from malleable clay just by debugging. Planning is 
a must in most art. But the process is also a finding, so we have to preserve the 
paradoxes that “plans have to be debugged" and “engineering has to still learn in the 
process of building”. 


In my experience, we had a quite a bit of luck in first debugging our ideas before 
worrying about the amount of optimization necessary for practical use. Since some of 
our ideas involved real-time interactions by humans, we needed to do our 
experimental work on individual super-computers that were fast enough not to need 
optimizations, and we generally had to design and build these as part of the process. 


One way to think about this part is that it is like the “SIM” part of CAD->SIM->FAB in 
modern engineering. We can SIM the ideas and much else of software if we have extra 
capacity. For example, the “Worlds” system of Alex Warth et al, allows fine grained 
UNDO and “possible worlds reasoning” at most levels of execution. This allows a 
“cradle” to be built around the project that will allow every event to be captured and 
studied if necessary. 


Finally (for here), it’s worth thinking about what good tools for reformulation of 
programs might look like. | think I'd start with where Smalltalk left off, find the next set 
of improvements in dealing with always running code, and think a lot harder about 
how to deal with dependencies in graceful ways, and especially how desired 
dependencies can be preserved while reformulating. 


(A Alan Kay 
\- 1 Have designed a few programming languages - Updated 6y 
Can Smalltalk code be written on "bare metal" hardware? 


Originally Answered: Can Smalltalk codes be written on “bare metal” hardware? 


I'm not sure | understand this question, but -sure-! 


When a bunch of us were at HP in the mid-2000s we did a bakeoff for a children’s 
machine (a few years before OLPC) where there was not really enough room for the 
speed, graphics ability, and low power consumption. We invited many vendors to 
bring their boards with the CPUs installed. Overnight we would bootstrap Smalltalk (its 
Squeak version which was particularly aimed at “bit-identical” bootstrapping) and 
besides running our own benchmarks, we could see what it would be like to interact 
with comprehensive GUls and graphically heavy apps. 


(The CPU we chose, was also the one — for similar reasons — that was chosen a few 
years later for the OLPC XO.) 


The route to that kind of bootstrapping really depends on what's already there (i.e. 
how “bare” the hardware is). It is fun to think about how to do it from absolutely bare 
metal, but this is never the case these days. There is always a C compiler at least. In 
fact Squeak was written in itself, and the critical parts of this were a subset that could 
be translated into C. So bootstrapping was really dealing with various resources 
provided by the vendor board. 


An historical note. There was a guy by the name of Ted Steele in the 60s whose hobby 
horse was UNCOL (universal computer oriented language) to solve the n*m problem. 
It was kind of an abstract assembly code. Instead, C for good and ill wound up being 
the UNCOL of the last 20+ years. 


lan Piumarta has done a number of “from nothing” bootstrap systems. 


Dan Ingalls and | had a very pleasant conversation a few weeks ago on the way back 
from visiting Bletchley Park about whether you could — with today’s large RAM — do 


a higher level bootstrap “fast enough” so that the garbage collector could be written 
in the higher level language before you ran out of memory ... 


Again, the existence of C makes this more of an artistic gesture rather than one of 
desperation. 


Another conversation we had — which included the work of Alex Warth — was a 
better minimal bootstrap of Val Schorre’s Meta Il system. Alex did one that was nice (it 
turned into his UCLA thesis), but | think there is still gold to be mined in that POV. 


| returned to this (Oct 27th) because there are a few more cases to ponder. 


1. Let us suppose that the machine doesn't have any software yet, what then? 
When | was working my way through the U of Colo as a programmer for the 
National Center for Atmospheric Research (NCAR) in the early 60s, the 
powers that be decided to get a Control Data 6600, one of the first real 
supercomputers. The head programmers at NCAR were certain that CDC 
would not be on time with their SW (there had been much slippage with the 
current 3600 machine at NCAR). So they decided to do much of the systems 
software themselves just in case. My job was to take the manuals of the 6600 
and write an efficient simulator of the machine for the 3600. | did, and they 
did, and so we had a fair amount of debugged system software ready to be 
loaded on the 6600 when the bare machine started working. (The rest of this 
story is fun, but off topic, but | should mention that one of the benefits of a 
simulator is that you can put in a lot more facilities for dynamic debugging 
than the real HW easily allows.) 


Nm 


. BCPL, the predecessor of C, was in the form of a byte-coded machine that 
one would first get going on the new HW, and it had the facilities to then 
write more efficient machine code generators and recompile itself, so it could 
work as a systems language on the new HW. One of the ways to bootstrap 
BCPL, was to simulate the new HW (as above), and get a lot of the new 
system running before even the new HW got going. 


>» Alan Kay 


‘I'm the "Alan Kay" in question (try Google for the usual misinformation) - Updated 1y 


aD 


What are some good books on history of computing? 


This is a very good question! I’m in the UK for a while right now and don't have access 
to my home library, so I'll add to this answer incrementally as | think of more items. 
However, | also invite all readers of this to provide some of their favorite history 
sources. 


| will confess that | don't know of a great "master comprehensive” book for the whole 
field. We should really try to find one — | think it would be of the most use for the 
Quora audience. 


——- added June 2022 ——- 
| should have included some old classics about old machines e.g. 


e “Giant Brains” by Ed Berkely 


“Faster Than Thought” — B.V. Bowden 


“Computing in the 20th century” — Nick Metropolis 


“Whirlwind” 


“AN/FSQ-7" (Sage) 
° "LEO" 


One area to look at is what ACM has made happen in terms of histories. 


e There are three “History of Programming Languages” books, that cover many 
of the major ones. 


e There is a “History of Workstations” book with contributions by many of the 
major inventors of personal computing. 


¢ There is a book of the first 25 Turing Award lectures. The other Turing 
lectures can be found on the ACM site. 


e and quite a bit more 


I'm not so thrilled about the Computer History Museum time-lines, etc. but they have 
a lot of stuff. The “Fellows” elected are arguably a more representative list of names 
for the history of computing since the 50s than the Turing Award. Running down their 
backgrounds would reveal a lot. 


The best book for ARPA-IPTO/Xerox Parc is 


e “The Dream Machine" by Mitchell Waldrop 


Other books | can think of (let’s run down the titles) 


e Maurice Wilkes’ “memoirs” - includes history of EDSAC 


e | think there is a book about Konrad Zuse — maybe “the first” modern 
designer builder 


¢ There's the Atanasoff Berry computer — might have predated Zuse’s 
e History of Whirlwind 


e There are a number of pretty good Babbage/Ada histories 


More to come. ... (and let's have more contributions ...!) 


€ Alan Kay 
Have designed a few programming languages - 6y 


How does a program like Microsoft Excel get written from scratch? 


One good answer is “find out how Dan Bricklin and Bob Frankston did the original 
spreadsheet Visicalc on the 8-bit micro Apple II". 


Another approach is that the abstraction of a spreadsheet is simple, so make a 
prototype using any language that can make instances. This will run very slowly, and 
will reveal that a lot of what a useful spreadsheet is about is the optimizations for both 
calculating and for viewing. 


While you are at it, try to think about what horrendously got missed going from 
Visicalc ca 1978-9 to Excel, which was being prototyped ca 1984. 


For ideas about spreadsheets you could also look at my Scientific American article 
“Software %”" for Sept 1984. 


€ Alan Kay 


Have designed a few programming languages - Updated 6y 


What is software architecture in layman's terms? 


When a term is borrowed from somewhere else — for example: “science”, 
“engineering”, or “archtecture” — it is usually important to start with what the terms 
meant in the developed serious disciplines they were taken from. So we should ask 
“what is modern science?” and try to critique “computer science” partly on that basis. 
Similarly, we have “software engineering” — we need to look at what real engineers in 
the civil, electrical, aviation, biotech, etc disciplines actually do, how they think about 
it, how they go about it. 


The exercise we just did will reveal that the two terms above are at the very best 
-aspirations- and at the worst the Emperor's New Clothes. |.e. a much too high BS 
component. 


When | look at “architecture” | see a wide range of design activities, that range from 
the utilitarian to the beautiful, and from sculpture to highly synergistic new non-linear 
combinations of basic materials to allow new kinds of designs to be thought about, 
and to then be built. 


In “architecture” the “arch” doesn't refer to “an arch” — it could, but it doesn’t — but 
to “the top, the best, etc”: it’s a superlative of a kind — like Moriarty being Holmes’ 
arch-enemy. The “tecture” refers to building. 


Below | start to drift away from “layman's terms", but I'd like to mention a beautiful 
book (for everyone) that gives an excellent “layman's sense”: “Cathedral” by David 
Macauley. The only area that doesn't quite have enough narrative — especially for the 
child readers — is how and why arches work, and just where the forces arise that need 
something like flying buttresses as a solution (architects have to think about these 
kinds of things also). 


The good news is that computing doesn't entirely lack examples of great science, 
great engineering, and great architecture. But good and great examples can be hard 
to find. Sometimes they are relative to their time, and sometimes they are timeless. 


Some of my personal favorites are Bob Barton's Burroughs B5000 computer (more 
than a half dozen of the best SW ideas ever in the actual hardware of a higher level 
computer), John McCarthy's Lisp (the invention of a new kind of computer stuff (better 
bricks are also part of architecture)), Butler Lampson’s Project Genie OS, Dan Ingalls’ 
Smalltalks, Gerry Popek’s Locus OS, etc. 


And of course, the Internet (and its earlier relatives, the ARPAnet, the Ethernet, and the 
AlohaNet). But not the web or the web browser. 


I've gone a bit beyond “buildings” in the above. So one question we could ask is “what 
do we have to do to grow a beautiful city?” And “what do we have to do to keep it 
beautiful?” And “can we grow it larger and keep it beautiful?” 


These are good questions also with respect for SW, because sometimes a beautiful 
system gets done, gets popular, and then gets quite messed up in the process. 


€ Alan Kay 

\- 1 Have designed a few programming languages : 6y 

What is the best way to promote a non-traditional Smalltalk programming 
language implementation? 


| don't know, I've never tried to promote any kind of programming language. 
Certainly, the things that have attracted me to various programming languages have 
mainly been compelling examples. | think this is the place to start. 


€ Alan Kay 


| am the Alan Kay in question. - Updated 6y 


What did Alan Kay mean by, "Lisp is the greatest single programming 
language ever designed"? 

Originally Answered: What did Alan Kay mean by, "LISP is the greatest single programming 
language ever designed"? 

First, let me clear up a few misconceptions from the previous answers. One of them 
said "Try writing an operating system with Lisp”, as though this would be somehow 
harder. In fact, one of the nicest operating systems ever done was on "The Lisp 
Machines” (in Zeta-Lisp), the hardware and software following the lead of "The Parc 
Machines” and Smalltalk — and we in turn had been very influenced by the Lisp 
model of programming and implementation. (These operating systems in both 
Smalltalk and Lisp were both better (claim) and easier to write (simpler to 
demonstrate) than the standard ones of today.) 


Another interesting answer assumed that “the test of time” is somehow a cosmic 
optimization. But as every biologist knows, Darwinian processes “find fits” to an 
environment, and if the environment is lacking, then the fits will be lacking. Similarly, if 
most computer people lack understanding and knowledge, then what they will select 
will also be lacking. There is abundant evidence today that this is just what has 
happened. 


But neither of these has anything to do with my praise of Lisp (and | did explain what | 
meant in more detail in “The Early History of Smalltalk @%"). 


To start with an analogy, let's notice that a person who has learned calculus fluently 
can in many areas out-think the greatest geniuses in history. Scientists after Newton 
were qualitatively more able than before, etc. My slogan for this is “Point of view is 
worth 80 IQ points” (you can use “context” or “perspective” etc.). A poor one might 
subtract 80 IQ points! (See above). A new more powerful one makes some thinking 
possible that was too difficult before. 


One of our many problems with thinking is “cognitive load": the number of things we 
can pay attention to at once. The cliche is 7+2, but for many things it is even less. We 
make progress by making those few things be more powerful. 


This is one of the reasons mathematicians like compact notation. The downside is the 
extra layers of abstraction and new cryptic things to learn — this is the practice part of 
violin playing — but once you can do this, what you can think about at once has been 
vastly magnified. There were 20 Maxwell's Equations in their original form (in terms of 
partial differentials and cartesian coordinates). Today the four equations we can think 
about all at once are primarily due to their reformulation by Heaviside to emphasize 
what is really important about them (and what is likely to be problematic — e.g. the 
electric and magnetic fields should probably be symmetric with respect to movement, 
etc). 


Modern science is about experiencing phenomena and devising models whose 
relationships with the phenomena can be “negotiated”. The “negotiation” is necessary 
because what's inside our heads, and our representations systems etc have no 
necessary connection to “what's out there? 2". 


Taking this point of view, we can see there can be a “bridge science” and "bridge 
scientists” because engineers build bridges and these furnish phenomena for scientists 
to make models of. 


Similarly, there can be a “computer science” and “computer scientists” because 
engineers build hardware and software and these furnish phenomena for scientists to 
make models of. (In fact, this was a large part of what was meant by “computer 
science” in the early 60s — and it was an aspiration — still is — not an accomplished 
fact). 


The story behind Lisp is fun (you can read John McCarthy's account in the first History 
of Programming Languages). One of the motivations was that he wanted something 
like "Mathematical Physics” — he called it a “Mathematical Theory of Computation”. 
Another was that he needed a very general kind of language to make a user interface 
Al — called “The Advice Taker” — that he had thought up in the late 50s. 


He could program — most programs were then in machine code, Fortran existed, and 
there was a language that had linked lists. 


John made something that could do what any programming language could do 


(relatively easy), but did it in such a way so that it could express the essence of what it 
was about (this was the math part or the meta part or the modern Maxwell's 
Equations part, however you might like to think of it). He partly did this — he says — 
to show that this way to do things was “neater than a Turing Machine”. 


Another observation about this is that the “slope” from the simplest machine 
structures to the highest level language was the steepest ever — meaning that the 
journey from recognizable hardware to cosmic expression is a rocket jump! 


As is often the case — especially in engineering — a great scientific model is often 
superior to what exists, and can lead to much better artifacts. This was certainly true 
here. Steve Russell (later famous for being the main inventor and programmer of 
“SpaceWar") looked at what John had done, and said: "That's a program. If | coded it 
up we'd have a running version”. As John remarked: “He did, and we did”! 


The result was “unlimited programming in an eyeful” (the bottom half of page 13 in 
the Lisp 1.5 manual). The key was not so much “Lisp” but the kinds of thinking that 
this kind of representational approach allowed and opened up regarding all kinds of 
programming language schemes. 


A fun thing about it this is that once you've grokked it, you can think right away of 
better programming languages than Lisp, and you can think right away of better ways 
to write the meta descriptions than John did. This is the “POV = 80 IQ points” part. 


But this is like saying that once you've seen Newton, it becomes possible to do 
electrodynamics and relativity. The biggest feat in science was Newton's! 


This is why “Lisp is the greatest!” 


@ 


Is there a programming language that's effectively a successor to Smalltalk? 


Alan Kay 
Still trying to learn how to think better - 6y 


Originally Answered: Is there a programming language that's effectively a successor to smalltalk? 


Well, let's see. The first usable Smalltalk was designed and implemented by the end of 
1972. That makes it 45 years old, and the main ideas about OOP | contributed go back 
to the end of 1966 (that makes this particular line of thought — dynamic 
programmable OOP — 51 years old). If Moore's law represents a doubling every 18 
months to 2 years (depending on what you look at), then the changes in combined 
scales since then are probably at least a factor of 100,000,000. 


The big ideas have fared better than the implementations we did back then, but quite 
a few of the more important future looking ones — for example, having objects be 
able to negotiate their interoperability with other objects — were barely thought about, 
and not worked on. 


In my mind, a real successor to Smalltalk is really needed, and it needs to be 
qualitatively a successor, not just a better version of the old ideas. 


Scaling does need to be done much better to deal with today and tomorrow. Another 
dimension that is very important is expressibility. Smalltalk at Xerox PARC was able to 
do its "OS", its graphics system and UI, its (live) development system, and a number of 
“personal computing abilities" involving media, high quality fonts, etc. in about 10,000 
lines of always operating code (and given that this is a moving target, we could just as 
easily say 20,000 lines of code to make the same point). 


So, we should think about something important and large in functionality that a real 
successor of today or the near future could bring to life in 10,000 to 20,000 lines of 
code (vs e.g. code that is 1000 or more times larger). 


Certainly — harking back to Sketchpad — we should call for a deep set of abilities to 
go automatically from the "whats" to the "hows" via a constellation of integrated 
problem solvers under the hood. We really want to do a lot of the programming of the 
future in terms of “runnable requirements", etc. 


One big thing that we talked about and some interesting experiments made, and 
more recently a general facility made (the “Worlds” system on the Viewpoints Writings 
page), is the “simulation of time” for many purposes: “possible worlds reasoning”, 
parallel transactional functional relationships from one world “level” to the next, 
generalized UNDO, etc. This should definitely be done on the next serious effort for a 
new programming language. 


Similarly, we should call for a much better approach to how software development is 
done. Smalltalk pioneered a lot of IDE ideas (and we got a lot of ideas from several of 
the previous Lisps). But to be done seriously, software has to take engineering 
seriously, and should now look to see what the integrated CAD -> SIM -> FAB tools 
are like in the real engineering disciplines (civil, electrical, bio, aero, etc.) 


All in all, | may be missing something out in the hinterlands, but what | see when | 
look out at the programming landscape is rather tiny little incremental improvements 
(with occasionally some really bad regressions), but | don't see serious efforts to 
invent “programming for the 21st century"). 


We can blame bad funders for some of this. But | wonder what would happen if good 
funding showed up. How many computer people are thinking about “what is actually 
needed" rather than "what would be a little better"? 


€ Alan Kay 

<7 Still trying to learn how to think better - Updated 4y 

What does Alan Kay use in lieu of PowerPoint? 

I've given a lot of talks using Etoys (in Squeak), a programming environment we made 
for children (starting at Disney in the late 90s) that also had decent media design. The 
current day Scratch is a subset descendant of Etoys, but without many of the useful 
media abilities. Another interesting system that | used for many years was “Frank”, the 
comprehensive system we built for the STEPS project in a few 10 thousand lines of 
code. 


Many of the talks I've given both use and require comprehensive end-user 
programming — and PPT does not supply this (despite having both the Smalltalks and 
Hypercard preceding it). So it's both annoying and maddening. A good talk of a sort 
can be given in PPT if one omits some of the terrible conventions — like bullet points, 
lots of text, etc — and concentrates on having the visuals complement the words 
rather than being a cheat sheet. 


My favorite recent talk using programmable media was the one a few years ago for 
the Ted Nelson tribute. For “Christmas fun” in 2014, several of us worked for a few 
weeks to resurrect Smalltalk-78 (the vintage of Smalltalk that Steve Jobs saw in the 
famous visits to Parc in 1979) from a file that had been rescued from a disk pack that 
Xerox had thrown away. This was the Smalltalk that had been made for the portable 
“Notetaker” computers we had made; it was nicely sculpted by Dan Ingalls for this 
purpose; and was completely written in itself, including the "OS", development tools, 
etc — everything — in about 10,000 lines of code plus about 6k bytes of machine 
code for the Notetaker CPUs. 


This was my favorite of the “engineered” Smalltalks (that came after -72), and was 
great fun to program in. | was very happy to do the media presentation for the Ted 
Nelson talk (on YouTube) entirely in it. 


Despite the one-bit-per-pixel graphics, I've thought about doing more talks using this. 
The biggest difference between then and now is that today there is much more room 
for images than back then: we had them but the memories were too small to use 
more than a few. 


é& Alan Kay 

<1 Still trying to learn how to think better - Updated 6y 

As a beginner, what are the best ways to approach Computer Science? 
If you are just looking to get a job in computing, don’t bother to read further. 


First, there are several things to get clear with regard to any field. 


e What is the best conception of what the field is about? 
e What is the best above threshold knowledge to date? 


¢ How incomplete is the field; how much will it need to change? 


When I've personally asked most people for a definition of “Computer Science” I've 
gotten back an engineering definition, not one of a science. Part of what is wrong with 
"CS" these days is both a confusion about what it is, and that the current notion is a 
weak idea. 


The good news is that there is some above threshold knowledge. The sobering news is 
that it is hard to find in any undergrad curriculum. So it must be ferreted out these 
days. 


Finally, most of the field is yet to be invented — or even discovered. So the strategies 


for becoming a Computer Scientist have to include learning how to invent, learning 
how to change, learning how to criticize, learning how to convince. 


Most people in the liberal arts would not confuse learning a language like English and 
learning to read and write well in it, with the main contents of the liberal arts — which, 
in a nutshell, are ideas. The liberal arts spans many fields, including mathematics, 
science, philosophy, history, literature, etc. and we want to be fluent about reading 
and writing and understanding these ideas. 


So programming is a good thing to learn, but it is most certainly not at the center of 
the field of computer science! When the first ever Turing Award winner says 
something, we should pay attention, and Al Perlis — who was one of if not the definer 
of the term said: “Computer Science is the Science of Processes", and he meant all 
processes, not just those that run on digital computers. Most of the processes in the 
world to study are more complex than most of the ones we've been able to build on 
computers, so just looking at computers is looking at the least stuff you can look at. 


Another way to figure out what you should be doing, is to realize that CS is also a 
“blank canvas” to "something" kind of field — it produces artifacts that can be studied 
scientifically, just as the building of bridges has led to “bridge science”. Gravitational 
and weather forces keep bridge designers honest, but analogous forces are extremely 
weak in computing, and this allows people who don't know much to get away with 
murder (rather like the fashion field, where dumb designs can even become fads, and 
failures are not fatal). Getting into a “learned science of designs that happen to be 
dumb" is not the best path! 


We (my research community) found that having an undergraduate degree in 
something really difficult and developed helped a lot (a) as a bullshit detector for BS in 
computing (of which there is a lot), (b) as a guide to what a real “Computer Science” 
field should be and could be like, and (c) to provide a lot of skills on the one hand and 
heuristic lore on the other for how to make real progress. Having a parallel interest in 
the arts, especially theater, provides considerable additional perspective on what Ul 
design is really about, and also in the large, what computing should be about. 


So | always advise young people -not- to major in computing as an undergraduate 
(there's not enough “there there”) but instead to actually learn as much about the 
world and how humans work as possible. In grad school you are supposed to advance 
the state of the art (or at least this used to be the case), so you are in a better position 
with regard to an unformed field. 


Meanwhile, since CS is about systems, you need to start learning about systems, and 
not to restrict yourself just those on computers. Take a look at biology, cities, 
economics, etc just to get started. 


Finally, at some age you need to take responsibility for your own education. This 
should happen in high school or earlier (but is rare). However, you should not try to 
get through college via some degree factory's notion of “certification” without having 
formed a personal basis for criticizing and choosing. Find out what real education is, 
and start working your way through it. 


€ Alan Kay 
Still trying to learn how to think better - 6y 


What are the backgrounds of engineer-scientists and scientist-engineers? 


| don't know how to give a reasonable answer (because there is quite a spread in 
personalities, skills, and knowledge across the very large fields). But, | can give a “non- 
reasonable” attempt at this. If you take "STEM", realize that the “T” for “Technology” 
just means whatever humans make — and so is too large and diffuse a category — 
and look at the historical order of the fields, we can use “T" for “Tinkering”, then 
“Engineering”, then “Math”, then “Science” to get TEMS. 


Giving the past its due, it is quite reasonable to date what we think of as Science today 
to the turn of the 17th century, and especially the amazing “Principia Mathematica” of 
Newton was shockingly powerful both in method and philosophically. It also exerted a 
huge magnetic field on TEM to pull the Venn diagram of the 4 pursuits much closer 
together. A lot of the big hitters of the last 150-200 years have happened in the sweet 
spot intersection. 


Again, | don't know how to generalize here, but I've had the pleasure of working in 
that sweet spot for quite a few years, so | can at least say something from my own 
experience. 


The people | worked with all had gotten quite good at pretty much the same things, 
with regard to Math, Engineering, and Science — basically, top levels of knowledge. To 
me, the main differences could be put down to temperament and personality type — 
and what they wanted to achieve, what it meant to have done good work. The result 
of this was that everyone had a lot to talk about, and everyone had the background to 
understand the different points of view and goals. Usually the different types could 
work together, and there the differences created magic. 


€ Alan Kay 


Still trying to learn how to think better - 6y 


If computer science looked more like engineering, what would "software 
simulators” do? 


Let's try to use as parallel analogies as possible to “science” (what physics, chemistry, 
biology, etc.) do, and to “engineering” (what civil, electrical, mechanical, bio-, etc. 
engineering do). 


And let's also look for interesting overlaps. For example, making bridges is “very 
engineering”, but once a bridge is made it manifests “phenomena” and scientists can 
study it to make models that manifest like the phenomena. Herb Simon (both Turing 
Award and Nobel Prize) wrote a terrific book called “The Sciences of the Artificial” that 
is worth reading. 


Nowadays it is hard to do “good bridge” without both deep science and deep 
engineering — and some deep computing, because we mostly definitely want to 
simulate our understanding/models of bridges both for edification, and also for both 
success and safety. And from these better models we'll get better insights about how 
to design and build better bridges. 


To add an historical note here, an early act of real computer science happened when 
John McCarthy needed a programming language to program his “Advice Taker” UI in 
the late 50s, and he wanted it to be mathematical in form. To do this he made a model 
of computing and wrote it up as a start on “A Mathematical Theory of Computation”. 
This was “very science”! Steve Russell said "That's a program. If | programmed it we'd 
have a new language”. As McCarthy said, “He did, and we did!”. The result was Lisp — 
and it was also a “better bridge” than what was studied: the model suggested much 
better ways to think about programming than how FORTRAN went about it. 


Let's suppose we invent a programming language and we'd like to evaluate our 
design as early in the process as possible. One way to express the design is as a model 
— it's a kind of a “theory”. Unlike regular math there are usually too many degrees of 
freedom for classical proof techniques, but we can get a lot out of being able to run 
our theory. 


At this point, let's pause to think about what we are doing. On the one hand we are in 
something very like the CAD->SIM->FAB process that good engineering disciplines 
have (and let's pretend we have similar powerful tools for our domain). We can see 
that the “FAB” part of the process in making a language useful involves a lot of various 
kinds of optimization to fit existing machines, other systems, etc, (a lot of stuff). This is 
very similar to what has to be done in the physical world for FAB: lots of important 
nitty gritty stuff. Now what could “SIM” be? How about getting our CAD model 
language to run as directly as possible on a supercomputer so that we can write 
serious programs (including real-time ones) to see how we like our language. (Note 
that if we've defined the semantics of our language in a good way in the CAD part, 
then our tools should be able to more or less directly run and simulate the model.) 


On the other hand, we are very interested in the “computer science” aspects of our 
design. If we like Al Perlis’ (the first Turing Award winner) definition: “computer science 
is the science of processes” (he meant all processes!), then our CAD->SIM will allow us 
to deeply probe the process phenomena of our design (and we really need to!). For 
example, being able to program complex parallel systems is one of the keys to making 
progress in the future, and the problems are far from being just engineering type 
problems: they are also deeply scientific in nature. 


In short, we aren't very good at computing, thinking about computing, or doing 
computing. We need to get better designs vetted and improved much much quicker. 
To do this we need to have an intermediate “simulation” phase and the tools to go 
with it on every worthwhile project we want to do. And having these will suggest 
many projects we should be doing but just can't see. 


G Alan Kay 
‘7 Still trying to learn how to think better - Updated 3y 


What does Alan Kay think about Gerry Sussman's recent talk at the Strange 
Loop conference titled "We Really Don't Know How to Compute!"? 


| like to read rather than watch videos. I've always been a big fan of Gerry since he was 
a grad student at MIT, and have generally agreed with him. If he is complaining about 
current programming and programming languages then he is probably right and I'd 
probably agree with him. 


I'm motivated enough to get this transcribed and read it**. Sound odd? | can 
remember pretty much everything | read, and can read a lot faster than | can "watch" 
— | feel as if someone is subtracting brain cells from me when | have to watch 
something (except maybe someone good playing music). (So I'll get back to this in a 
while.) 


Bottom line is that Sussman is *interesting*. An interesting person is someone who is 
interesting whether you agree with them or not. So it's always worth finding out his 
opinion. 


** Thanks to Hrvoje Krot for transcribing this talk (lyssphacker/talks 7)! 


When | found a few days ago this had happened, | read the talk, and now will try to 
answer the question more directly. 


| think this is a terrific talk by a guy who has never been afraid to have — and let us 
know — his own opinion (and most importantly, willing to do the work to substantiate 
his opinions). 


The heuristics he gives are very worthwhile to internalize. His arguments about and 
against this, that, and many other things are all worth pondering deeply. 


I think he would agree that one of the properties that computing shares with more 
traditional math, is that the parts of it that are understood can be understood 
completely. The nature of the descriptions is to be about themselves in a way that can 
exist entirely in our minds without worrying about the “noise of physics” etc. In fact, 
computing looked at this way is a new branch of mathematics. 


This is another way of supporting his quite right assertions that computing should not 
be conducted anything like a religion, with creeds to be dogmatically believed in, and 
followers of creeds, and inter-creed disputes and warfare. 


Instead he argues for “really thinking things through” and this “from scratch” as much 
as possible. Those who don't are continuing to make a major mess of things and 
issues that need to be solved rather than patched over. 


The “bottom line” above is substantiated. | don’t agree with every notion in this talk, 
but I'm very glad to have heard all of them. 


€ Alan Kay 
‘Have designed a few programming languages - 6y 


How do you know when you have the chops to build your own tools? 


| don't know a good general answer. The ARPA-Parc gang got some sense of it by 
building tools, and sometimes undergoing what Butler Lampson once referred to as 
“bitter experience”. In those days of new computer architectures and non- 
standardized operating systems (or no operating systems) there was more tool 
building as a matter of course, and a lot of dives down ratholes never to return. 


This led in the field to a bit of a separation into a large group that did not want to 
build tools anymore, and a smaller group that was motivated to find better ways to 
build tools, including making tools to build tools. 


Parc was a bit of a sweet spot in time because quite a few heroic things had been 
done in the larger community in the near years before: for example, we were all 
impressed that the Engelbart group had adapted and made a “compiler-compiler” to 
make their own systems programming language for the SDS-940 (a tiny slow machine) 
and did it well enough to make it really pay off. Many of the people at Parc — like 
Butler, Peter Deutsch, etc. — had done a number of tough systems, including 
operating systems and programming languages. Some of us liked how “nice” and 
“tidy” Lisp was (if only something like it could be run fast — maybe on a custom 
machine?). 


The other thing that helped was that the number of computer people at Parc in the 
early days was just a few dozen, and there were a lot of things we wanted to do. The 
predominantly scientific backgrounds of the researchers responded to “too big and 
too complex" with “we need better maths!”. (This is a bit hard to explain these days — 
but take a look at how Maxwell's Equations could fit on a T-shirt from what they were 
in Maxwell's papers.) 


Translated: make super high leverage to deal with the problems. One of the solutions 
was whole-system design, so that the custom hardware optimized the software rather 
than having to do SW to work around the hardware, and this allowed “Lisp and 
beyond” software solutions to stay simple and efficient enough to have real power. 


The simple key of "doing tools" is that they mustn't take over the goals, so they have 
to be done quickly and well — for example, Chuck Thacker did the whole first Alto in 
just a little over 3 months; Dan Ingalls got the first version of Smalltalk going in a 
month, from some handwritten kernel code that took me about two weeks to come 
up with. 


This is where the “chops” come in (and part of “chops” — when you are playing jazz or 
classical — is that first and foremost you have to produce “real music”, and this means 
you have to stay within your current abilities and concentrate on the art). You “get 
chops” by practicing for some years before you have to perform ... 


| needed to add that having chops still doesn’t mean every endeavor is going to work 
... We occasionally at Parc would get into “little troubles” from biting off a bit more 
than we could chew that sometimes were pernicious. The recoveries from a couple of 
these later made for good stories around the campfire. 


€ Alan Kay 

<7 Have designed a few programming languages « Updated 6y 

What did Alan Kay mean by, "I made up the term object-oriented, and | can 
tell you I did not have C++ in mind."? 


It's hard to praise too highly the programming languages that are the bridge from one 
way of looking at programming to much better ways of looking at programming. The 
two greatest such in the 60s were Lisp and Simula. Perhaps the greatest single 
conception of a software system of the 60s was Sketchpad. 


As explained elsewhere on Quora, and in “The Early History of Smalltalk", | had chance 
encounters with Sketchpad and Simula in my first week of grad school in late 66, that 
shocked me into a realization about “computers as basic and universal units” via the 
connections and parallels with other like things, such as biological structures, 
computers on networks, processes in time-sharing systems, general systems of parts 
intercommunicating, and so forth. | started to think about dynamic languages to make 
such processes, and how the processes could be made efficient and parsimonious 
enough to be universal. 


Someone asked me what | was doing, and without thinking, | said “object-oriented 
programming”. (A very bad choice as it turned out, for many reasons.) 


In the first years of the 70s at Xerox Parc with the great help of a terrific research 
group, and especially Dan Ingalls, we were gradually able to solve the software 
engineering problems to make this “systems approach” to programming practical (and 
especially on the emulation architecture hardware of Chuck Thacker). The starting 
place was how Lisp systems were built and made efficient. 


This led to real power of expression, and a number of breakthroughs at every level, 
from the user interface to the metal. 


This attracted great attention, and 10 years later in the 80s “object-oriented 
languages" started to appear. The only ones that had some of the same flavor as 
Smalltalk were several versions of Lisp. There were several “object Pascals”, even an 
“Object COBOL"!, and of course, C++. 


It's important to realize that C++ was part of a chain of ideas starting around 1979 by 
Bjarne Stroustrup who was also shocked into action from encountering Simula. He was 
not trying to “steal from Smalltalk” in any way. Here's what he says in his history 


paper: 


C++ was designed to provide Simula’s facilities for program organization together with 
Css efficiency and flexibility for systems programming. ... The goal was modest in that it 
did not involve innovation, and preposterous in both its time scale and its Draconian 
demands on efficiency and flexibility. 


tou 


Elsewhere, he takes pains to say that he’s “not trying to do what Smalltalk at Xerox 
Parc did”. He was essentially trying to do with C what Simula did with Algol. 


His approach was via “Abstract Data Types” (which the co-inventor of Simula — Dahl 
— also liked), and this is the way “classes” in C++ are generally used. And, similar to 
Simula being a preprocessor to Algol, C++ was a preprocessor to C: the “classes” were 
program code structuring conventions but didn't show up as objects during runtime. 


And for a variety of reasons, some of them not good, some reasonable, C++ got very 
popular. 


It became part of the “colonization” of the term “object-oriented”. By the end of the 

80s we could not explain the software at Parc as being “object-oriented” because the 
term had become co-opted. This kind of thing happens a lot in many areas. So it was 
understandable, and Bjarne certainly wasn’t to blame. The term became a kind of fad. 


But it was — and is — quite annoying. | had to start calling the Parc stuff “real 
objects", etc. 


The push-back on complaints was the same one for pop changes in language usage 
and meaning i.e. “the people determine what a word or term means”. 


But the -ideas- of OOP that worked so well for us at Parc came from many places, and 
we liked Goethe's quote: “We should all share in the excitement of discovery without 
vain attempts to claim priority”. We did what we did, and we called it what we called it. 


Hence the quote, which is not so much about C++ per se but about the term that we 
had been using to label a particular approach to program language and systems 
design. 


— Note: Quora tends to bury comments, so I'm copying an interesting one here 
along with my reply: 


Abhinav Sharma: Thanks! I'm curious if you considered rebranding “real OOP” once 
the term had gotten colonized? | realize it must feel sad to have the term you coined 
get misconstrued but do you think a rebranding would be a good way for the ideas 
themselves to live on? 


AK: | think the ideas are living on for historical purposes for those who are interested. 
And something can be learned about human nature by comparing implemented 
artifacts from the past with what got accepted and why. 


On the other hand, the 70s idea of “real OOP” was hugely powerful back then, but 
what was implemented was far from a complete set of ideas, especially with regard to 
scaling, networking, etc. Dave Reed's MIT thesis happened in 1978 and this dovetailed 
with ideas | thought important also, but funding and other circumstances delayed 
implementing his version of “real objects” until the early 2000s (this was Croquet). 


How dynamic objects intertwined with ontologies and inference was explored by 
Goldstein and Bobrow at Parc. Their 4 papers on PIE and their implementation were 
the best extensions ever done to Smalltalk, and two of the ideas transcended the 
Smalltalk structure and deserved to be the start of a new language, and perhaps have 
a new term coined for it. 


One way to look at this is that computers have enormous degrees of freedom and 
capacity, and this makes it very difficult to come up with great tools to program them. 
This is a mismatch with how humans learn and particularly with how we learn and do 
skill based activities. The latter tend to be very conservative, whereas the demand of 
computing is that we have to shed our skins every few years (and we don't). 


| don't think that “real OOP" as we thought of it then, is the way to go in the future 
(and didn't then). Consider Sketchpad, and that it is programmed in terms of 
constraints that the system solves to guarantee a very close fit between what was 
desired and what is manifested. This is an early glimpse into “requirements-based 
programming”. It has something like objects — hard to get away from the main idea 
— but is “relational” rather than message-based (the messages are implicit, etc.) 
Sketchpad was a tour de force in many ways, including its solvers. But they didn't go 
far enough to handle general requirements. Today | think this is doable via a half 
dozen new techniques plus enormously larger machine capacities. 


Stuff like this is what we should be working on! 


é Alan Kay 
“7 Still trying to learn how to think better - 6y 


| don't want be a specialist. How can | be a great generalist in computer 
science? 


Every once in a while I’m asked “What does it mean to be a Computer Scientist?”. My 
answer is "| don't know exactly, but take a look at Butler Lampson. He is certainly the 
quintessential Computer Scientist if ever there was one". 


We find in Butler someone who covers in knowledge and skills a very wide range in 
computing and software engineering, who has not just designed/invented but also 
built operating systems, programming languages, networking and internetworking 
systems, desktop media, specification languages, atomic transaction protocols, and 
much more, and hardware architectures and detailed designs for making hardware ... 
And more. A partial summary is here. @ 


He must have been the easiest Turing Award winner to choose in 1992 (the top 
recognition in computing), and is one of the few ever to have also been awarded the 
top engineering award, the IEEE John von Neumann Award. He was inducted into both 
the National Academy of Science and the National Academy of Engineering. Etc. 


Once in an ARPAnet design meeting in the 60s, Butler was supposed to give a one 
hour talk at the end of the day, but the meeting dragged on. When it was his turn, he 
looked at his watch and said “I've got a plane | have to catch", and proceeded to give 
the -entire- talk in 20 minutes, in perfect understandable English at great speed! It was 
shockingly impressive even from him. The person next to me said “It's a great privilege 
to just know someone like Butler”. That talk — on a capability based operating system 
design — covered the content in a paper you can read %. 


Butler is also one of the clearest writers in our field. Here “is a partial list of his 
papers over the years. And here & is the first page of his website with a few tips on 
stuff to look at that has been most popular with readers. 


Certainly in the top few of the most impressive people I've ever met, and especially to 
have had the fun of working with. 


Now to try to answer your question. | think that — besides Butler's obvious brilliance 
— there are a few keys to his range and depth, and many of these revolve around 
-systems and processes- rather than hardware and software, or programming 
languages and operating systems. Dealing with whole systems was in the zeitgeist of 
the ARPA community back then, and it found Butler while on his way — after an 
undergrad degree in Physics from Harvard — to grad school in Physics at Berkeley. As 
he puts it “| walked in the wrong door and found people trying to debug code for a 
new computer. And never found the doorway out”. 


He wound up being the principal designer of the operating system of this computer 
— Project Genie — which was a breakthrough in practical and affordable time- 
sharing, and in its commercial version — the SDS 940 — both was the computer that 


Engelbart's “mother of all demos” was done on, and which also launched the 
commercial time-sharing industry. 


Butler was and is interested in -anything- that has dynamic relationships — systems 
and processes — (and note the analogies with what is studied in physics). | think 
looking at the world this way — all systems and processes (including physical and 
biological) — will round you into a “generalist in computer scientist” who can be a 
vital contributor not just a dabbler. There are only a few people around like Butler, but 
how he has gone about his art can be and is an inspiration for the rest of us. 


€ Alan Kay 
Still trying to learn how to think better - 6y 


How could public libraries disrupt themselves? 


Libraries made their way in the US partly because they provided something not easily 
found otherwise. Still, it's sobering to realize that about half of all the libraries in the 
US by about 1920 were there because they were funded by a single philanthropist 
(Andrew Carnegie). He was an immigrant child laborer from Scotland and started his 
climb to immense wealth in his early teens because his first boss opened his house 
library to his laborers on Saturdays. 


Most “Carnegie Libraries” had open stacks (apparently the idea originated with these 
grants), and almost always a special room for children, and another special room 
where reading was taught in the evenings. 


In the ARPA-Parc work in the 60s and 70s on personal computing and pervasive 
networking, two of the most compelling images for the future were to take ideas like 
the Carnegie Libraries (with their special children's rooms and rooms where you could 
learn to read) and the Whole Earth Catalog (with its perspectives and interlinked 
“systems of systems” of ideas) to the next levels that computers would make possible 
— not just to imitate them and make them more convenient, but to extend what 
literacy, reading, writing, understanding could be. 


Today, libraries are on the decline, along with bookstores. -Something- is more 
convenient for some people, and whatever this is has started to displace the old 
resources and venues. As with many things in the age of electricity and electronics, the 
displacement is not being done by replacement with media of similar value, but by 
subsets, sometimes disastrous subsetting. For example, especially for good readers, 
none of the screens available now are as good to read from. 


We can find out about the existence of a lot of small things, but generally speaking we 
don't wind up with whole books when we use a search engine for something. So in 
many cases — as with replacing newspapers with TV — we lose depth and width. As 
media scholars have pointed out, this started happening in important ways after the 
telegraph ca 1845 made currency of information more important than content. 


Similarly, going to smaller screens with smaller viewing angles (regardless of the pixel 
density) is a subtle barrier for extensive reading. Going to non-keyboards is a subtle 
barrier for writing real prose (rather than transcribing short oral modes of thought). 


Another — | think disastrous change — is from what one got from the open stack 
approach: you went looking for one thing, and found many things you didn't know to 
look for (and even about the subject you were interested in). So it is surprising (from 
the standpoint of reason and idealism) that Amazon has done a pretty terrible job of 
providing Uls that make serendipity more possible (they haven't done zero, but | 
wonder if they do understand the larger idea here). This is a shame because | love that 
Amazon has such a large catalog, and especially a second hand catalog. If you know 
what you need and want, there's a good chance you can get it. 


On the other hand, Amazon at present doesn't have a “free” library idea — that is a 
library supported by taxes for those who can't afford to buy a book every other day. 
Libraries are quite expensive as individual transactions. (Some years ago the 
Dartmouth library in a study said $55 per book taken out. This cost is subsumed for 
the public benefit to make a “free” library. This got Dartmouth to do the same thing 
for the entire town with the university's time sharing system! You could find a teletype 
at a gas station with the operator doing his accounts in a program he had written in 
Dartmouth BASIC!) 


And it's very surprising to me that there doesn't yet seem to be a free system online 
that will teach people to read. If semi-Als are to have any real value at all, this kind of 
service is what is needed from them. 


To try to answer Eddie's question, | think we will see further erosion of the physical 
libraries unless something unique and important about the physical structures can be 
realized and enhanced. How about the teaching and children’s functions of the 
original Carnegie Libraries? Both of these are enhanced by having people learning and 
doing together. How about providing much better ways to browse books? And to 
“take books out"? All of the functions suggested in this paragraph are being done 
poorly by the current consumer companies: there is room for something of higher 
value and supported by the local communities and philanthropists. 


One of many of Carnegie’s insights is that libraries are not just for access to 


information, but that they create important social and amplifying roles in their local 
communities. Better ideas along these lines will make a big difference. 


And could online as well. And for the same reasons via the poor subsetting that is now 
being accepted as replacements. The progression has been to redefine “normal” 
downward, and this makes it difficult for unsophisticated people to see the decline 
(things look somehow better — but they are confusing convenience for content). 
Today, it's a bit hard to imagine a force for good to make this happen. We don't seem 
to have Andrew Carnegie types today (we do have some large billionaires who do 
some philanthropy, but so far there is not much of a comparison). A reasonable 
political system (the oxymoron of the century) would be focusing immense resources 
both on the early grades of school and on community learning centers, especially in 
the evening (how about adding libraries to community colleges, etc.?) 


Finally, | think I'd take a shot at dealing with the homeless (whom are often the most 
numerous in libraries these days). Maybe this is too big a task to try to include therapy 
and hope as part of a public institution's functions, but on the other hand, why not? 
The Greek lawgiver — Solon — was reputed to have said "You can judge a society by 
how well it treats its least fortunate citizens”. | think our society has a long way to go 
in this regard. 


€ Alan Kay 
‘I'm the "Alan Kay" in question (try Google for the usual misinformation) - 6y 


What was the total economic impact of research at Xerox parc in the 1970s? 


Hard to nail down and substantiate (and it would be good and interesting to do so). 
Some years ago | saw an estimate — | think from a university — that said “in excess of 
$30 Trillion dollars, perhaps as much as $35T". | have no idea whether this was really 
drawn from comprehensive data (and how can you draw lines between (say) the 
impact of integrated circuits and that of personal computing, the GUI, the Internet, 
OOP, laser printing, etc.?). 


However, figuring an average of about $1T/year contribution from ARPA/Parc 
(impossible for me to separate these) within the much larger whole sector since 1980 
— perhaps a little less — seems pretty reasonable to me (that’s not a substantiated 
estimate, just a supposition). 


Another way to look at it could be to assign value based on importance. Integrated 
circuits are right up there. A very large part of the market completely required the GUI 
to be invented. A substantially overlapping part of the market completely required the 
Internet to be invented. ARPA/Parc was the main factor in the inventions of the last 
two. 


One part of this | do know with more foundation is that Xerox paid for all of Parc 
many times over from the profits of just one of the inventions: the laser printer by 
Gary Starkweather. | don't know exactly how much the four labs of Parc cost in the 
decade of the 70s, but it was probably something on the order of $200 million in 
today’s dollars. So even one billion of profit is a factor of 5 ROI, and the actual profits 
were far larger. 


€ Alan Kay 


Still trying to learn how to think better - Updated 6y 


What was Xerox PARC's mission statement? 


| don't recall one — I'm pretty sure — not totally — that there wasn’t an “official” one. 
But the intent of Jack Goldman — the Chief Scientist of Xerox, a physicist, who 
decided to make PARC — was to devote a small percentage of Xerox's R&D resources 
to “non-directed” research in the areas of Physics and Chemistry, Optics, and 
Computing. 


The understanding was that the scientists would very seriously take Xerox's present 
and possible and desirable futures into account when deciding what to work on. The 
previous sentence is an informal “mission statement”. 


I think this worked very well. It's hard for me to recall any project done in any of the 
labs that wasn’t sincerely thought of by the researchers as being for the benefit of the 
company — and | think most of the choices made in all the labs really were of 
potential benefit to the company. | also think that it was the researchers themselves 
who decided what to work on within this context. 


It was certainly the case that "the company” (meaning the corporate executives) really 
had a tough time understanding many of the results. The disconnect with the results 
from the computer labs are famous. But let me mention the wonderful flat screen 
display technology done with particle migration (much later used in the elnk displays 
on many eBook readers) — one of whose main researchers was Anne Chiang. Xerox 
had many of the key patents on many kinds of flat screen technology (including LCDs 
and amorphous semiconductors). But “the company” just wasn't willing to invest in 
taking their special inventions to manufacturing and marketplace. 


| needed to mention here that despite the disconnects, Xerox made billions from one 
of the Parc inventions — the laser printer by Gary Starkweather — and paid for all of 


Parc many times over. This is the biggest point of this story: that the multiples from 
new inventions are often so high that doing risky research in many areas with top 
people and modest resources is a not to be missed strategy. 


€ Alan Kay 
<< Still trying to learn how to think better - Updated 6y 


If you had to teach ‘Computer Science 101’, how would you begin your first 
lecture? 


Let's think about this question for a bit. Some years ago | was invited by “CS 
Departments” to give talks at a number of universities. Almost accidentally | asked the 
first audience of undergrads, grad students and professors for their definition of 
“Computer Science”. Not one could give other than an engineering definition. | did 
this at each new place on the tour with identical results. Other questions | asked 
included “Who was Doug Engelbart?” A few said “didn't he have something to do with 
the mouse?” (this was extremely upsetting since my research community had put ina 
lot of effort to make it possible to answer such questions with two or three clicks of 
the mouse that Engelbart indeed had something to do with). The problem was partly 
lack of curiosity, partly quite narrow internal goals not having to do with learning, 
partly not having the faintest idea of what a “science” might be about, etc. 


I've been an adjunct faculty member of the UCLA CS Department for a few decades 
(basically a full professor but | don’t have to go to faculty meetings). | occasionally 
teach classes, and occasionally with freshmen. Over the years, the already low curiosity 
about computing has dropped considerably (and it has also gotten a lot more popular 
as it is seen as a route to a well paying job by learning to program and getting 
certified by a “top 10” school). Accordingly no student has been heard to complain 
that the first language at UCLA is C++! 


| think we are faced with a situation where both the meanings of “computer” and of 
“science” have been done in by massive weak conceptions to create a new term — a 
kind of a designer jeans label — that sounds good but is quite hollow. A related term 
that has similarly been done in is “software engineering”, again by not taking the 
largest ideas of both “software” and of “engineering” and combining them (the large 
senses of both was the intent in the 60s when the terms were originally coined). 


Since one of the reasons to actually go to a university is to go beyond mere vocational 
training, and instead to contact deeper ideas, | think it is quite reasonable for any 
good introduction to a field to try — through examples when possible — to have the 
learners touch the real stuff and to start to get a feeling about what is interesting and 
important and core to the field. 


First graders delight when you show them that a ruler on top of another ruler can 
make an “adding machine” where they can outperform 5th graders adding fractions. 
They will then happily contribute to making even better adding machines. They have 
touched a real computer — a physical or mental tool to help us think. They have 
learned a really powerful way to represent numbers — more powerful than their 
schools will teach! They've been able to connect their commonsense idea of “addition” 
as “accumulation” to something analogous with powerful new properties, They have 
programmed it to make it solve a large variety of addition problems. They have 
extended it. Etc. It is not a digital computer. It is not a stored program computer. But it 
is the essence of a computer. Just as the Antikythera mechanism is the essence of 
what a computer and computation is all about. 


How far and large can we go before things get out of hand and we lose our way in too 
much abstraction? I've always been partial to the characterization of Alan Perlis — the 
first Turing Award winner and who perhaps coined the term “Computer Science” — 
who said in the 60s: “Computer Science” is the "science of processes". All processes. For 
Quora, let's not try to pound this further or make it into religious dogma. Let's just 
happily use Al Perlis’ idea to help us think better about our field. And especially how 
we should go about trying to teach it. 


Now we need to look at the modern meaning of “Science”, and Perlis was quite firm 
that this should not be diluted into old meanings (for example, the gathering of 
knowledge) and uses (such as "Library Science” or even "Social Science”). He meant 
“science” as trying to understand phenomena by making models/maps that try to 
exhibit, “track”, and predict the phenomena. 


I've given several talks on how many of the best maps and models can often fit on a 
T-Shirt, as we have seen Maxwell's and other Equations. 


An analogy here is that there is a “Science of Bridges’ even though most bridges are 
made by people. But once a bridge is made, it exhibits phenomena, scientists can 
study them, models of many kinds can be made of bridges, and large comprehensive 
and useful “Theories of Bridges” can be formed. The fun of this is that then new 
bridges can be designed and made (I've mentioned before that there is hardly 
anything as fun as getting scientists and engineers together to tackle things big and 
important!) 


Herb Simon — a Turing Award and Nobel Prize winner — called all of these “Sciences 
of the Artificial” (and wrote a terrific book with the same name). 


Let me give an example here. In the 50s companies and universities were building 
stored program computers, and starting to program them — and there was a special 
moment when Fortran appeared in 1956 — not the first higher level language, but 
arguably the first one engineered well enough to be used in many different areas, 
including many that had only been done previously in machine language. 


All of this produced “phenomena”. 


The story of Lisp is more complex than | can tell here, but John McCarthy got 
interested in trying to find a "Mathematical Theory of Computation”, and he was 
determined to make both sides of the phrase work as well as possible. The function 
eval that will interpret a Lisp program into its meaning, just does fit on a T-Shirt! 


This makes it tiny compared to a “programming system”. More important, this “theory 
of computation” was also a much more powerful conception than Fortran! It was a 
better idea of a bridge! 


The tininess of Lisp allows the whole idea of programming at a deep level to be 
grasped in one eyeful and a few fingers, and to be thought about in ways not possible 
when looking at enormous artifacts (this is one of the reasons that scientists like their 
maths to be compact and powerful). The math used here is a new math, because it 
allows notions of before and after, and this led to the “logic of fluents”, also by 
McCarthy, that allows both functional relationships and reasoning to be preserved 
while also allowing state and the passage of time. (This is still not understood today in 
the rough and tumble world of ad hoc programming.) 


Lisp as both a powerful programming language, and a metalanguage that can express 
its own theory, is an example of real “Computer Science”. 


If you learn this, and other things like it, then you will be a much more powerful 
thinker, and more in charge of your own destiny than if you had just learned to 
program in Fortran, or its modern equivalents (... you will be on your way to actually 
learning how to be a computer scientist). 


You will learn a lot more about the special kinds of design that are needed in 
computing (for example, it is not generally appreciated that computing often really 
requires one to transcend the computing environment one starts with: one of the 
special characteristics of stored program computing is that it is not just material for a 
program, but material for a whole new computer). 


Another reason for Al Perlis’ choice of definition is because computing in the large is 
much more about making systems of many kinds than it is about algorithms, “data 
structures”, or even programming per se. For example, a computer is a system, 
computations are systems, the Ethernet and Internet are systems, and most programs 
should be much better systems than they are (old style programming from the 50s 
has lingered on to the point that it seems to many that this is what “programming” is 
and should be — and nothing could be farther than the truth). 


The Internet is a good case in point — unlike most programs today, the Internet 
doesn't need to be stopped in order to fix or improve it — it is much more like a 
biological system — by our intention — than what most computer people think of as 
a computing system. And as such it is much more scalable and robust than almost all 
software systems done to date. This should really be pondered before weak ideas are 
given to beginners as representing “computer science”! 


So what we need to do in a good first course in computing is to take into account 
what the students can do in the beginning, and then try to stay inside their “cognitive 
load" limits to help them get to what's really important in a way that they can both 
touch and make it themselves. It is critical to “stay real” here and find ways that are 
intellectually honest and can fit those just starting to learn. (Please don’t teach bad 
ideas just because they seem a little easier — a lot of bad ideas are a little easier!) 


The students need to start by making something themselves that has most of the 
important characteristics I've discussed here. It needs to be a system of multiple 
dynamically interacting parts, etc. A good way to assess what language to use is 
whether it is simple to make something that has thousands of interacting parts! If not, 
you need to find one. The worst thing that can be done is to get the students on a 
path to a too weak fluency that is a too limited subset of the big ideas. This just kills 
them — and we want to grow them not kill them. 


€ Alan Kay 
<< Still trying to learn how to think better - Updated 6y 


How would you interpret Marvin Minsky’'s idea that “Computer science 
already has a grammar. What it needs is a literature.”? 


The most interesting aspect of Ken's blog entry (including the comments) is that 
nowhere do we find an historical reference to this idea. In fact, more than 50 years 
earlier in the 60s it was a part of many a conversation, and if | recall correctly, a few 
papers. 


| first heard the idea from Bob Barton when in grad school ca 1967, and he said that 
this was part of the motivation that Don Knuth had for writing his “Art of 


Programming", chapters of which were already being passed around. One of Bob's 
main questions back then was about “programming languages designed to be read by 
humans as well as machines”. And this was a main motivation for parts of the design 
of COBOL in the early 60s. And perhaps more important for our subject here, in the 
very early and quite beautifully designed interactive language JOSS (mostly by Cliff 
Shaw). 


As Frank Smith has pointed out, a literature starts with ideas worth discussing and 
writing down; it often partially generates representations and extends existing 
languages and forms; it leads to new ideas about reading and writing; and finally to 
new ideas that were not part of the original impetus. 


Part of the idea of “being literate” is both doing the reading and writing, and 
incorporating references to other writings that may be of interest. For example, 
Marvin's Turing Award lecture starts out with him saying that “The trouble with 
computer science today is an obsessive concern with form instead of content.’ @ 


He meant that what's most important about computing is meaning, and how it can be 
dealt with and represented, rather than one of the big topics of the 60s, how to parse 
both programming and natural languages. For him, what was most interesting about 
his grad student's — Terry Winograd’s — thesis at the very same time was not how 
well it could handle English grammar (it was very good at it) but that it could put 
meaning to what was said to it, and could reason using that meaning. (This is an echo 
from the past of what Ken reports in Marvin's blog.) 


A parallel way to look at the “universally learned language” idea is again from seeing 
the difference in what different writers can discuss and illuminate through writing. 
Quite a bit can be done without changes to the language, and even without adding 
vocabulary. Similarly with math symbols and syntax — it is very easy to write a formula 
that the reader can easily parse, but where the meaning would require them to study 
for a few years. This is part of what Marvin is driving at. One of the fun Turing 
machines in Marvin's book “Computation: Finite and Infinite Machines” (one of my 
favorite books ever), is a fairly typical computer with only two instructions (add 1 toa 
register, and subtract 1 from a register and branch to a new instruction if the register 
goes less than 0 — there are many variants). 


This is a universally learnable programming language, but note the gotchas. A 
reasonable solution to “universally learnable” also needs to have certain kinds of 
expressive power — which will likely take longer to learn — and then you start having 
tradeoffs with a public that really doesn't want to do much learning on anything. 


Don's interest in what he called “literate programming” led to an authoring system 
(originally called WEB) that would allow Don to explain the very program that was 
being written, and which included many features to allow extraction of parts of the 
program for human examination. The idea was that the WEB document was the 
program, and the compiler could extract the compilable and runnable parts from it. 


Another early wrinkle was the idea of dynamic media which was an active idea in the 
late 60s, and for many of us was to be an important part of interactive personal 
computing. One of the several motivations for this idea was to have something like 
Newton's Principia in which the “math” was dynamic and could be run and was 
attached to graphics, etc. This was part of the impetus behind the Dynabook idea in 
1968. One of the terms that started to be used back then was “active essay”, where the 
kinds of exposition and argument one expects in an essay are enhanced by having an 
interactive program as one of many media types for the new kind of document. 


Some very good examples were done in Hypercard by Ted Kaehler and myself in the 
late 80s and early 90s. Hypercard wasn't directly set up to do this — the scripts were 
not media objects for the cards — but you could do some work and get scripts to 
show on the cards and be interactive. A particularly provocative example was “Weasel” 
which was an active essay explaining a part of Richard Dawkin’s book “Blind 
Watchmaker" by allowing a reader to experiment with a framework that used a kind of 
breeding process to find target sentences. 


It's worth pondering that despite the almost perfect fit of what Hypercard was about 
to the emerging Internet — and its wide adoption by the early 90s — the people who 
did the web decided not to adopt either it or the larger early Engelbart ideas. And 
Apple — who had many ARPA/Parc people in its research wing — refused to listen to 
them about the importance of the Internet and that Hypercard would make a great 
start as a symmetric reading and writing system for it. Apple declined to make a 
browser at a time when a really good one would have been significant, and might 
have made an enormous difference in how the “public face” of the Internet would turn 
out. 


If we move forward a few years, we find the absolute absurdity — even almost 
obscenity — of a web browser without a real authoring system (consider how silly wiki 
authoring had to be to even work at all), and as one of many prime examples, the 
Wikipedia article on e.g. LOGO, running on a computer, yet without any ability for the 
reader of the article to try LOGO programming from within the article. This meant that 
what was important about computers was blocked off from the users in favor of 
various embodiments of old style media. 


It's worth pondering that Wikipedia was and is a prime genre for thinking about, 
inventing, implementing, and writing “the literature of computing" that is needed (and 
that would certainly involve both reading and writing of many kinds of media 
including programming). 


It's even more worth pondering that | can’t write a program here in this Quora answer 
— in 2017! — that will help show what I'm trying to explain, despite the immense 
computer power that lies under this weak idea of interactive media. This is completely 
missing “what happened?”. 


To get a glimmer of the problem, here is a system from 1978 that we resurrected a few 
years ago partly to do a tribute to Ted Nelson, and partly for the fun of it. (Please start 
here at 2:15 &) 


The whole system is an early attempt more than 40 years ago at what I'm talking 
about. An explicit example can be seen at 9:06 /. Besides “dynamic objects”, one of 
the key insights here is that the “views” — the media that are seen on the page — can 
be handle uniformly and independently of their contents (we called these “models"). 
Everything is a “window” (some have explicit frames and some don't show their 
frames). They are all composited on the “project page”. Another insight was, since you 
have to compose and composite some things, make it so all things are composable 
and compositable. 


| think unsophisticated users are to be forgiven for not being able to criticize bad 
designs. But computer people who make interactive media for users, and who do not 
take the trouble to learn about media and design, especially from the history of their 
own field, should not get off lightly, and should not be rewarded for “making ‘normal’ 
weaker". 


Finally, a field without a real literature is almost tantamount to not being a field. A 
literature is a way to keep the great ideas in a genre fresh and in the present and 
future of thinking within the field. This certainly doesn't exist in computing to any 
useful extent. Like a pop culture, computing is still most interested in what can be 
done without large learning curves, and where the doing is more important than the 
ramifications of the results. A literature is one of the environments where one can 
move from the simple and immediate to the larger and more important. 


We need it! 


€ Alan Kay 
<1 Still trying to learn how to think better - 6y 


As a true engineer, how do you figure out what are the right tools to invent 
to help you do your work? 


I've worked with “true engineers” and | think of myself as an “enthusiastic hobbyist” 
with some “scientific leanings”. Despite this much of what I've done over the last 50 
years has very large engineering components (you have to build things in computing 
to vet your ideas). 


If you are interested in engineering, then it is a very good idea to learn about great 
feats in the field over the years (and there are many). A fascinating course | had in the 
engineering high school | attended (Brooklyn Technical High School), was “Industrial 
Processes” — taught to all Freshmen: it was about how things are made, especially 
basic materials, and especially in quantity. 


A good example of large engineering to look at is the Empire State Building. It was 
started just as the Depression was started, and there was a sense there would be no 
more “tall buildings” for a while. 


The builders — The Starrett Brothers — decided to make the building a tribute to their 
trade and art. From the time of the start of demolition to the time of occupancy was a 
little less than one year, and the whole thing was done with around 3000 people. The 
process was rhapsodic if you love engineering, and much can be said about how they 
did it. There are a number of terrific books, including a facsimile of the daily log of one 
of the supervisors. 


For the purposes here, the brothers — who had built several other tall buildings in the 
city — were competing with a number of construction firms to get the bid. They were 
asked "What tools do you have for this job?”. The other firms assured the owners that 
they had all the tools needed. Paul Starrett said, “Not a blankety blank thing! Not 
even a pick and shovel. Gentlemen, this building of yours is going to present 
unusual problems. Ordinary building equipment won't be worth a damn on it. 
We'll buy and make new stuff, fitted for the job ... That's what we do on every 
big job. It costs less than renting secondhand stuff, and it's more efficient.” 


Some of the tools they built included a narrow gauge railroad for every floor of the 
building, enormous temporary elevators on the outside of the building, new methods 
of scheduling so accurate and comprehensive that the steel girders were still warm 
from the steel mills in Pittsburgh. And so forth. 


We loved this story at Xerox Parc, and one of the several reasons for its great success 
“working on unusual problems” was the same as for the Empire State Building: making 


not just the stuff, but the stuff to make the stuff. 


This is interesting to ponder, and it's an example of a saying where the first order 
version is true, and the second order saying is the opposite but also true. First order is 
“don't make your own tools, especially in hardware and software, computers, languages, 
operating systems, development systems, it’s a rathole you can get trapped in” 


Yep. But, “if you have the chops to make your own tools then you should and must” 
because you will be able to free yourself from other ideas, and uncessary 
workarounds, etc. 


The answer to this question depends on the chops you've developed. (Great 
engineering needs great chops!) So develop them well ahead of time if you are 
serious about any field. 


Meanwhile it's worth pondering just where computing is in comparison. Could we do 
“the largest of something” in less than a year and muster about 3000 people to do it? | 
don't know of any such happening in our history. (Maybe we don’t know very much 
about the engineering of our own field ...) 


€ Alan Kay 

<1 Iread a lot: 6y 

What books would Alan Kay recommend for people who want to become 
true engineers? How should they go about building projects? 


As with science, medicine, counseling, and many other fields, | think there are issues of 
temperament as well as knowledge — there is a kind of “calling” here. And, | think, a 
kind of “attitude”. 


Henry Petroski is an engineer who has written a number of very good books about 
engineering @, and these can be perused to get a basic “feel from the outside”. 
Another excellent engineer who writes very well is Sam Florman. There are several 
excellent talks and essays by Richard Hamming ... 


A key part of engineering is the love of doing things, and especially directly, and 
especially doing them well. A large part of engineering grew out of “tinkering” (aka 
“hacking") with added desires for “principled designing and making’, “integrity”, etc. 
All of the great engineers | know personally have deep moral streaks about what they 
do, and about why it “should be done as well as possible”. Part of the overlapping 
temperament with science is that of the “lab rat”, who is happiest in being finicky 


about an experiment or making a new piece of experimental apparatus. 


If we make a Venn diagram of the historical progression of “STEM" we get an 
overlapping "TEMS" : Tinkering, Engineering, Mathematics, and Science. Most modern 
practitioners get good at all of them, and much of the best stuff is found in the sweet 
spot intersection of these. Great “do it” teams are formed from people who do a bit of 
all, but are most deep in one or two of the circles. I've had the most fun in my career 
working with great engineers, and having an engineering background from high 
school helped a lot (even though | strayed into science and math). 


As far as advice is concerned, I'd urge not just tinkering and making, and not just 
getting fluent in all of TEMS, but to find internships etc where real things are being 
done, especially tough things. A lot can be learned by watching experts do their thing, 
and being able to do things along with them. 


A big revelation to me was the “attitude” part that | found when | accidentally wound 
up in a grad school that was part of the ARPA community. The whole community was 
just “used to being able to fly by visions, and to make anything that was needed to 
make the visions real”. In that kind of culture, with that kind of confidence, and that 
kind of track record, it is much easier for learners to pick goals that really matter, and 
to find ways and colleagues to make them happen. 


€ Alan Kay 
\ 1 Still trying to learn how to think better - 6y 


In software engineering, is it better to just figure it out and hack together a 
solution or to understand the fundamentals and then implement it? 
Originally Answered: In software engineering, is it better to just figure out and hack together a 


solution or to understand the fundamentals and then implement it? 


Interesting range of answers so far ... 


The first question | would ask is whether you are taking the word “engineering” 
seriously or is your notion of “software engineering” an oxymoron, or an ancient sense 
of the word, as in “Library Science”. 


If you want to take “engineering” seriously, then it is worth looking at how it is done 
outside of software in civil works, aviation, electronics, bio, etc. All of these rely heavily 
on scientific theory combined with pragmatic model building and testing, and these 
use powerful CAD tools combined with SiMulators, and very often then linked to 
FABricating facilities to make both extended prototypes and eventually many 
instances of the desired artifacts. (By contrast, the tools most computerists make for 
themselves are quite terrible, even embarrassing! For example, Vi, staying with edit- 


compile-load cycles, not using SW simulators, etc.) 


In praise of real engineering in computing, there is nothing more fun than to be a 
scientist-engineer working with an engineer-scientist (a great example of the latter 
was Chuck Thacker at Xerox Parc). 


My favorite line in the movie “Spy Games" is the spy talking to his assistant "When did 
Noah build the ark Gladys? Before the flood, before the flood”. Here meaning that if 
you are going to have situations where huge deadlines are put before you, then as a 
good “real engineer” you will have prepared by having great tools already in place. If 
you haven't done this, you are not taking “software engineering” seriously enough. 


As for “exploratory programming" — it is very important to do almost always — the 
trick is to be able to do it so easily and quickly that one is not tempted to save it. | 
liked to do explorations — and found Smalltalk in its day really excellent for this — in 
order to get a better handle on the design parts of new ideas and goals. This kind of 
programming feels to me just like the pencil and pen sketches one does to 
understand form, and of which very few to none are adapted directly into the actual 
work (thank goodness for difficult to erase media (these force more visualizations and 
separate tests to be done)). These are also like the physical models of bridges, etc that 
are illuminating and necessary but not incorporated into the actual constructions. 


The last idea here is that many of the properties that help prototyping and exploration 
-can- be of great help in serious implementation: for example, how well does your 
programming language allow separation of “meaning” from “optimization”? Most 
languages and most programmers intertwine the two very different things, and this 
leads to difficult to understand and deal with dependencies, especially later in the 
learning curve when you are really finding out how your system should be 
implemented — and worse, much later when requirements have changed and the SW 
needs to change also. If you have a typical — bad — system for where we are in the 
21st century, you will likely try to patch rather than -safely- redesign within what has 
been done so far (hint: this is generally not a good method for any kind of 
engineering, including in software). 


What do you think the next great invention of the 21st century will be? One 


as big as the internet. This idea of synthetic realities is something | see 
having huge potential. 


Alan Kay 
Still trying to learn how to think better - 6y 


Really smart people — like Ivan Sutherland — can figure out most of the implications 
of new ideas and technologies from the get go. Take a look at his 1965 paper The 
Ultimate Display @ and see what you think about your question. Then you could take 
a look to see where VR and tactile sensing, etc., actually came from, etc. 


€ Alan Kay 


Still trying to learn how to think better - 6y 


What are those "powerful ideas that basically mean the difference between 
life and death"? 


You can identify many of them. Two types are (a) “powerful ideas that have immediate 
benefit on life and death” — for example, the germ theory of disease and what is 
implied for sanitation, being able to make insulin to save diabetics, and a myriad more 
having to do with health and nutrition. And (b) "powerful ideas that are about ideas" 
— for example, “there are powerful ideas” (it's worth finding them!), “we are mostly 
blind in so many ways” (so use powerful ideas to make tools of many kinds to help us 
“see” — one of them is “science”, etc.). 


An example of a problem that involves both (a) and (b) is the estimated 100,000 to as 
many as 400,000 patients that die unnecessarily each year — many in hospitals — 
because their doctors are not careful about sanitation. Here we combine the powerful 
idea of agents too small to see that can kill us and can be spread by contact, with 
trained humans who have learned about these agents, and how to deal with them, but 
whose commonsense part of their brains keeps them forgetting about the very germs 
they learned about and actually saw in microscopes. 


In the large, it is making progress in (b) and forming deep habits around (b) that 
should be one of the main goals of education. We need to do much more “real 
thinking” rather than the kinds of “reacting and rationalizing” that is our genetic 
heritage. 


é Alan Kay 
1 Still trying to learn how to think better - 6y 


Why did Xerox fail to successfully exploit any of its ideas at its Palo Alto 
Research Center from the 1960s and 1970s? 


A common myth (and easily exposed with a little poking of the web). The simplest one 
is “the 60s” (Xerox Parc started in 1970). 


Depending on how you count, there were around 7-9 major computer related 


inventions from Xerox Parc in the 70s. One of them was the laser printer (by Gary 
Starkweather). Xerox made many billions of dollars from this invention and paid off 
the entire investment in Parc many times over. 


I think this myth — this lie — was made — or at least promulgated — by business 
people who didn’t want to invest in these kind of research processes, despite the 
enormous return on investment to everyone. 


It's true that Xerox could have done more with some of the other inventions, but that 
isn't the point here. Consider that most of the inventions were “new industry making”, 
and needed to be accessible by all, as was the Internet (done by the larger ARPA/Parc 
culture). 


My biggest complaint has been that the commercial exploitation of most of the 
inventions was “half-way and in pieces", and this did not get fixed in the succeeding 
35 years or so. 


é Alan Kay 
“Lam the Alan Kay in question. - 6y 


What is Alan Kay's opinion of Assembly and mainstream programming 
languages like Google Go? 

Originally Answered: Alan Kay, what's your opinion of assembly and mainstream PLs like Google 
Go? Is language important? 

John von Neumann was famously against using valuable expensive computer time to 
automate human tasks that could be done by cheap labor. When FORTRAN came out 
in 1956 someone asked him about “higher level programming”. He summoned a grad 
student and said “Go write me a sort program”. The grad student left. von Neumann 
said "Either that student will come back with a sort program, or he won't come back at 
all. That's higher level programming!" 


This story is too good to be true, but | heard it about 6 years after it happened, from 
my boss at NCAR — Glenn Lewis — who was a grad student of von Neumann's at the 
time. (And an assembler was a waste of computer time according to JvN when you 
had students.) 


By contrast, lvan Sutherland in 1962 while inventing modern interactive computer 
graphics and object-oriented models, also decided to have the user “program” 
Sketchpad via simply giving it goals to accomplish (and completely bypassing the 
kinds of mechanisms of hardware digital computers of the time). The “compiler” of 
Sketchpad was a constantly running quite comprehensive “solver”. (It’s worth realizing 
here that eventually most of the programming of the future will be done this way — 
and it's worth trying to understand why CS hasn't really gotten to this after all these 
years.) 


In any case, pretty much all of the good questions about programming were being 
asked by the mid-60s, more than 50 years ago. Some of the machine architectures for 
answers to the questions were better than those available now. 


But still, the conflicts are virtually the same — programmers need to think in terms of 
desired relationships, but for advancing will need to do optimization of various kinds 
to get the throughput needed. And a lot of optimization comes down to finding ways 
to violate useful module boundaries without shafting the system, often by dipping 
lower into the causality structures. 


So the real question (I think) is not the above, but “how much of what kinds of code 
should be the max that can be stood". 


Looking at this would greatly help one of the main problems of any kind and level of 
language, and that is that most programmers wind up writing too much code, and 
especially code for which the language is poorly expressive. 


This is ironic — or worse — because one of the main points of a computer is that you 
can make new computers with them. A higher level language — or a low level one — 
besides doing what it was made for, can also make the escape from its own prison by 
allowing more suitable languages to be designed and made. 


In part, that's why we have so many languages — but for various — mostly bad — 
reasons they wind up as stove pipes rather than as a system of tools, each to be used 
where they really pay their way. But most computer people are still stuck in 
“algorithms and data structures” and haven't yet really gotten around to thinking in 
terms of systems ... 


€ Alan Kay 
| am the Alan Kay in question. - Updated 2y 


"If you don't fail at least 90 percent of the time, you're not aiming high 
enough.” - Alan Kay. What are some things Alan Kay has aimed for and 
failed at? 


Most things. (Consider the world of the idealist ...) 


A big turning point in my late 20s was to gradually move to evaluating on “quality of 


effort” rather than “quality of results” (still wanting the latter to be high, but eventually 
recognizing the real conflicts). Picasso had a great line for all of us: “A work of art is 
never finished, it is only abandoned". 


There's a nice intertwining with the art of science here: science doesn't get “truth”, but 
it can get “something good", something above important thresholds of illumination, 
something that can save lives, that can lift lives. 


Added July 10, 2021 


Certainly one thing that I've put many years of effort into without making much of a 
dent is trying to improve education — especially public education — for elementary 
aged through middle school children. This definitely qualifies for “failing more than 
90% of the time”! and | think also “for aiming high”. 


€ Alan Kay 

A 7 | read a lot: 6y 

In Alan Kay's answer about the best books to understand CS concepts, he 
mentioned that he would look for an excellent follow-on book to the “The 
Pattern on the Stone” book. Has he found it? 

Originally Answered: Alan Kay; in your answer about the best books to understand CS concepts, you 
mentioned that you will look for an excellent follow-on book to the “The Pattern on the Stone” 
book. Could you find it? 

There probably is at least one really good book out there that could be a follow-on to 
Danny's “Pattern in the Stone” book — but | haven't come across it. Perhaps some of 
the Quora readers can make suggestions. 


I've known Danny since he was a teenager, and he has been a colleague and friend for 
many years. When he wrote his book in the late 90s, | was very taken by it, and asked 
if he would mind if | wrote the “next logical level” for it — he thought that would be 
great. However, as John Lennon once said “Life is what happens while you are making 
plans", and the project never materialized. 


I think today, one could take the 9 chapters in the book and make the next logical 
level be online experiences in actually building the types of mechanisms that Danny 
introduces. Not just the programming parts, but the “nuts and bolts” primary 
hardware parts. There is enough computing power on personal computers today to 
do workable gates-and-wires simulations of whole computers. The Silverman brothers 
— and others — have done some wonderful examples of “oldies-but-goodies” — for 
example the recreation of the classic 8 bit micro CPU the 6502 (7. 


What's needed for the next level of Danny's book here would be a very simple CPU 
design (like the original ARM but simpler) that could be made online from just a few 
layers of architecture, and especially terrific simple CAD-SIM tools for beginners to 
help make the structures. 


Similarly, it would be fun to do a very simple bootstrap of a very powerful 
programming language (e.g. Lisp shows that super powerful can be super simple — 
this possibility of powerful-and-simple is one of the biggest ideas in both systems and 
computing). 


I think | would do the bootstrap somewhat in the style of Schorre’s Meta II (that was 
almost the way we did the original Smalltalk). 


The last two chapters in Danny's book would be the most fun and the most 
challenging to create an accessible level for “Al” and “biological brains”. 


The term “Al” today is used for such a restricted subset of “behavior” that the new 
term “Artificial General Intelligence” now has to be used for what "Al" used to mean 
(this is sometimes called “colonization”: when people want the cachet without the 
work, they often appropriate the term for what it is that they are willing to do — this 
also happened to “objects” in the 80s — we now have to call the original ideas "real 
objects"). 


| think it would do the readers of our hypothetical book a real service if we helped 
them understand “real Al” by including a section on “real thinking” (as opposed to just 
massive correlations). 


And there's a lot we could do with helping readers understanding biological nervous 
systems (perhaps by having them make some “organoids"?) 


Summing up, | think Danny did a really great job in writing a “concepts” book for the 
non-computer savvy reader. | think there is a real need for a “concepts-by-making- 
them” followup that is both real and very simple. A lot of the complexity in computing 
of all kinds has come from the admixture of optimizations, many historically needed, 
and also from poor designs. 


But for learners today with the computing power available today, a lot can be done 
with really simple mechanisms that have the same graceful style as Danny's and can 
be made and debugged and understood by the reader. 


Alan Kay 
1 Still trying to learn how to think better - 6y 
What are some new kinds of businesses you would like to see? 


The economist Milton Friedman — perhaps the epitome of “dismal” in “the dismal 
science”? — famously declared that the sole purpose of business is to make money for 
the shareholders. 


A larger deeper mind — Peter Drucker — countered with the idea that the purpose of 
business is to best serve the customers. 


One of my favorite businessfolks — Spedan Lewis — formed the John Lewis 
Partnership in the UK to make a company of hundreds of stores — John Lewis 
department stores and Waitrose Markets — that are completely owned by all of the 
employees (embracing Drucker's idea with a new important dimension). 


If we step back to look at the larger processes going on, we find the painful paradox 
that the -potential for doing- (a deep way to look at wealth) in our societies stems 
primarily from large cooperative organizations such as science and democracy: all 
wealth ultimately comes from the energy we can extract and turn to our purpose, and 
from the open ended societies that allow creative progressive directions to be chosen 
by the participants. 


But, squinting our eyes at smaller groupings within this, we find what appear to be 
myriad tribes of neolithic pre-agricultural hunter-gatherers exploiting the lush valleys 
they find around them, moving on to the next when there are no more resources 
easily found, and competing in every way with the other tribes around them. 
Metaphorically, and sometimes literally, stripmining society. 


To find atavistic behavior in modern times is not surprising — we are genetically 
predisposed for many kinds of such behavior — but the lack of understanding of how 
to really make progress given that the history of the last several hundred years is 
readily available is not just surprising, but shocking, and now: quite dangerous. 


If one of the perspectives we can take on our situation is systems, then we need to 
consider “The systems that we live in, and the systems that we are” — for example, the 
intertwining of: the universe, our planet, our social-governance, our technologies, 
ourselves, our mentalities. 


Cooperative organization has produced great synergies in many areas, and the non- 
understanding that systems organizations are only dynamically stable and have 
toppling points now threatens our life on this planet. 


The powers that need to be better handled are both material and informational — the 
largest powers of our time are communicational. 


As Vi Hart has pointed out “We need to ensure that human wisdom exceeds human 
power”. 


The kinds of businesses — new and old — I'd like to see are those that also 
understand that their very ability to exist depends on the larger systems working well, 
and that it is a primary function of all systems to increase in the large both real 
progress and stability. 


In other words, it's not about “hunting-and-gathering” but more like gardening, and 
not just for food, but is about the “gardening and nurturing of humanity”. 


€ Alan Kay 
‘| read a lot - Updated 6y 


What are Alan Kay's favorite books? 


| don't know how to give a reasonable answer to this question, but | don’t want to 
ignore the question either. 


| have friends, and know of people, who have certainly read more books than | have, 
but I've read enough to be stymied by questions about “favorites”. | have never tried 
to count them (it wasn't a competition), but when asked about this a few years ago | 
estimated “likely not more than 20,000". 


Not a lot when one realizes that the Library of Congress claims more than 23 million 
books yikes! 


In the 80s | was asked by a consulting company for my “top 10 books”, and came up 
with several hundred. This list has been around via various hands, and one version of it 
— Alan Kay's Reading List & — was put online a few years ago. 


One of the ways to think about this is that there are many more than "10 Great Ideas” 
and each one would likely at least be a book — and a good idea will probably give 
rise to at least one good book — and therefore we are really constricting our view to 
think of 10 or even 100 or maybe even 1000 top (and favorite) books. 


However, there have been books over the years that have seemed really special for 
various reasons — | think of these as “top 10” even though there are probably many 


more than 10 if | were to try to write them down. 


Just to take one as an example: Newton's “Principia”. | have never forgotten the 
combined shock and thrill of making my way through this in my 20s. The greatest 
single leap ever? That's probably too hyperbolic, but “oh my goodness”. Add in the 
amazing style and organization that he picked for this book, and it seems to me to 
shine above all others. 


A formative book in so many ways was “20,000 Leagues Under The Sea” by Jules Verne. 
The first translation | read as a kid was not the best one (I recommend the Anthony 
Bonner translation today). But “Capitaine Nemo c'est moi”. 


Ray Bradbury wrote a beautiful musing that compared "20,000 Leagues” with “Moby 
Dick’ 


Much earlier, the first adult book | read all the way through was my father's copy of 
“Mythology” by Edith Hamilton. | had one of my first “mental wakeup calls” when | got 
to the Norse gods part and realized they were quite parallel in character and process 
to the Greek and Roman gods in the front part of the book. This got me to realize 
these “were just stories” that had been made up for reasons | didn’t yet understand. A 
few more books like this, and by the time | got to first grade | had been ruined for the 
“single book — single truth” ideas of school and church, 


Another book that I've loved — as a lovely book to read — was the original “Molecular 
Biology Of The Gene” by Jim Watson (ca 1965) — and its truly wonderful offspring 

“The Molecular Biology Of The Cell” by Bruce Alberts et al (with Watson as one of the 
authors). This is a favorite for a different reason: for many years it has been the best 
single volume narrative of “life from scratch” (| particularly love the 3rd Edition). 


A favorite book is “The Lisp 1.5 Programmers Manual” by John McCarthy, et al. | have 
bought many copies of this book, both for its mojo and to give as gifts. It has been 
published by MIT Press since the early 60s, and is still in its original typescript form. 
Why this book? It’s kind of “the way things should be done”. The front section — by 
John — is especially fine and is a straightforward introduction to the language done in 
a kind of axiomatic style. Starting with a few primitive functions, he writes other 
functions showing how to manipulate symbols, and finally uses some of these to write 
the entire language in itself on the bottom half of page 13. | have called this the 
“Maxwell's Equations” of computing, because it presents a very large part of what's 
important about programming languages "in an eyeful” and “in a brainful” which 
provoke and allow further “nifty thoughts”. 


And to pick another favorite: “Computation: Finite And Infinite Machines” by Marvin 
Minsky. This is another really beautiful book, in approach, style, humor, and depth. It's 
essentially a math book, but one doesn't have to bring much math to it, just one’s 
mind that will be happy to touch some mathematical thinking. It was from this book 
— and an amazing Marvin way to show a Lisp like language using Goedel Numbers 
(hee hee) — that | first learned about Lisp 50 years ago. 


But how could | leave out a very different kind of book: “The Organ-Builder” by 
Francois Bédos de Celles? This is a several volume set from Diderot's 18th century 
many volumned Encyclopedia, which aimed to capture everything that was known in 
his day. In this case “really quite everything about organ building in France”. The 
edition | have was from a subscription for a beautiful translation by Charles Ferguson. 
One of the volumes has large reproductions of the many hundreds of engravings 
showing every technique, every tool, every pipe kind and scaling, etc. | was very lucky 
30 years ago to help master organ builder Greg Harrold build a mid-18th century 
organ (for Berkeley) that only used these techniques and “showed no modern tool 
marks". 


For favorites, | have to include ‘Art In The Blood — A Sherlock Holmes Adventure” 
written by my wife Bonnie MacBird (the original writer of Tron), and also the website 
that includes her annotations and original drawings /%. This was a labor of love on her 
part, and she completely nailed the Arthur Conan Doyle voice of the characters and 
narrative, while being able to carry a marvelous story into the much larger realm of 
the novel (a difficult territory for Holmes stories, and one that Doyle himself never 
entered). 


And so it goes. We could all make out a list like this. 


In the reading list mentioned above, | often said “any book by author X”, and | still 
think more in terms of authors than single books (I have a feeling most extensive 
readers do). It’s not that every book by a particular author will be equally enthralling, 
but I've found it really worthwhile to try to read all the books by an author if my first 
encounter was positive. 


| should also point out that it’s the fate of most ideas — and most books — to not be 
great. But, again, reading a lot of books will reveal that many not so good books 
somehow help a bit with the great ones. This is because context is the larger part of 
understanding. It provides the multiple perspectives we need to avoid religious 
feelings about any of our ideas. From this standpoint “most books help at least a 
little”. 


| will probably return to this answer a few times to tweak it up. 


é Alan Kay 
‘7 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 6y 


What if we were able to prove that AGI is beyond the computational power 
of the Turing machine? 

| think Cherie Woo below has a good approach to this (it was the philosophical 
approach used in the 60s for doubters of “eventual Al’). 


Philosophically, the question is “Can we in principle simulate physics well enough to 
make the biology we have, and to thus make the brains that we have?” 


| think the answer is “probably”. 


Engineeringwise, the question is “Can we simulate AGI efficiently enough to think the 
same kinds of thoughts (or better) that we think?”. 


| think the answer is “probably”. (And almost certainly the solutions will not be at the 
level of physics.) 


€A Alan Kay 

Have designed a few programming languages : 6y 

To what extent, if at all, did Smalltalk borrow from Lisp's philosophy or 
ideas? 

As Mark Miller noted below, a fairly complete history of Smalltalk can be found in The 
Early History Of Smalltalk @. Part of the history is an explanation of how long it took 
me to see the deep implications of what was in front of me (this is because | started 
with a conventional mindset about computers and languages, and that made me 
pretty blind to what was new and important). 


The simple answer is Yes! 


In the context of Biology and Mathematics, the object ideas came primarily from 
Sketchpad and Simula (and cells), the messaging ideas came from thinking about 
objects as being like cooperative processes in a time-sharing system or between 
computers on a network (and cells), and a lot of the metathoughts and underpinnings 
were directly influenced by Lisp (and math and cells). 


There were a number of revolutionary architectures to see in the early 60s — 
Sketchpad, Lisp (basically Lisp 1.5), the B5000 (a computer that could directly execute 
byte-codes completely safely, and was essentially a multiprocess “Simula machine” 
before there was a Simula), and APL (not yet implemented, but in Iverson's book). 
And, even Algol (with its recursion, call by name, nested block structure, etc. was 
revolutionary at the time). (And there was quite a lot more to see as well, especially 
some of the meta systems for generating languages from metalanguages, etc.) 


Just sticking with Lisp here, what was cosmically interesting was the idea of Lisp, 
meaning that there could be something mathematical about programming that could 
also be practical — that is: you could do something with relations and inference — 
and most especially, that a powerful universal language could also act as a practical 
metalanguage — not just for implementing, but for thinking about programming and 
programming languages. 


For example, if you look at McCarthy's beautiful lead up to the bottom of page 13 in 
The Lisp 1.5 Programmers Manual [%, you can see how John thought about these 
ideas. And once the whammy of the half page interpreter in itself has been grokked, 
then you will start thinking new thoughts about the semantics of programming and 
programming languages. 


It is also worth looking at some of the rest of this little book to see how the 
implementers of Lisp went about their business. One thing that made it into the 1.5 
version was the realization that “special forms" — which seemed to be called for by 
the functional underpinning of Lisp — could be thought of as a base “functlet” 
delayed parameter evaluation idea, and that normal functions could just be a special 
case of this. (Many different interesting thoughts came out of this realization, which in 
turn came out of needs for implementing “special forms” such as QUOTE, LAMBDA, 
COND, etc.) 


It's also worth looking to see how the practical dynamics of implementing and using 
overwhelmed some of the cleanliness of the initial ideas. Part of what went on 
happened “too fast” for the kind of thinking needed for great design. This led to some 
kludges. (And very similar things happened with Smalltalk later on once we started to 
use various versions of it.) 


Some of the kludges and not quite solutions came about with regard to functional 
arguments, the “prog” feature, scoping of variables, closures (Lisp 1.85 at BBN had 
four different, and interesting, notions of “closure”, etc.). 


tou 


Another kludge was leaving out John's “m-language”, which was supposed to be a 
much more readable way to do Lisp programming, in favor of just programming in the 


“machine code" of S-expressions. (There is another whole history of the ramifications 
of this “basically lazy" choice of the early implementers.) 


Most of these kludges were cleaned up, sometimes years later, and some in other 
languages. Interestingly, McCarthy had had some influence on Algol (for example, 
Algol conditionals), and Algol had some of the solutions that Lisp needed (such as 
static nesting scope, and the “call by name” mechanism for functional arguments — 
however, it is worth noting what was positive about the dynamic variable scoping, 
especially in some Al reasoning processes ... it would have been interesting to allow 
both ...). 


But the Lisp culture at MIT was quite in their own world. It is interesting to read some 
of the years later Scheme design documents to see the “rediscovery” in the 70s of 
some of the Algol mechanisms (some through the work of Peter Landin in the 60s). 


Still, even with kludges, there is a lot to think about here, and in compact forms that 
leave some brain-cells left for actually pondering programming semantics. 


All this “understandable kernel” lore should lead to variant thoughts — especially 
these days when there is much experience that can be brought to bear. 


Here's my favorite "most important unknown thesis in programming language 
thinking” Dave Fisher's 1970 CMU thesis Control Structures For Programming 
Languages (@. He uses the “Lisp approach” in a very sophisticated way to make a 
deeply meta extensible language. Much can be learned about many important things 
from grokking the first 100 pages of this work. 


The other quite wonderful much more recent piece of work that needs to be 
mentioned here is "The Art of the Metaobject Protocol” by Kiczales et al. (MIT Press). 
This is deep deep "Lisp thinking” in the service of objects, and vice versa. 


Just to mention one interesting “variant thought” to close off this too long answer: it's 
worth pondering the split of “eval” and “apply” in McCarthy's interpreter. “Apply” is 
tantamount to sending a message to an object, especially if parameter evaluation can 
be controlled on the receiving end by the object itself, and especially if the receiver is 
a closure. Now just make this whole mechanism as loosely bound as the rest of the 
ideas in Lisp, and you have a very simple basis for a different look at computation and 
programming. (And don't forget to note the implications if everything were actually in 
the form of property lists, not just raw storage cells ... .) 


Alan Kay 
Still trying to learn how to think better - 6y 


What tools are useful when outlining and drafting a CS dissertation? 


The number one tool is to have good results that are worth writing up. Too many 
dissertations these days are too incremental, and often from small papers done over 
the course of grad school — try to work on a real problem, and get real results. Dave 
Evans used to say that “A PhD is two years of world class research”. If it was a huge 
problem, then he allowed the first two years of real progress to be the dissertation. If 
it was a big problem, then he wanted a solution. He didn’t want anything that wasn’t a 
real advance to the state of the art. 


Ivan Sutherland, when asked “What is a PhD thesis?”, said “It's something you can get 
three people to sign!” 


So, first talk to your committee — and if things are early in the game, you can take the 
best actions by helping to put together your committee — especially to find a 
sympathetic outside member. 


would next suggest that you read a number of theses to get a sense of their form 
and rhetoric, and especially to find several that seem to fit your topic and argument — 
and personal style. And, for example, what are theses like at your institution? 


van's thesis on Sketchpad is a classic in every sense. Many of Marvin Minsky's 
students in the 60s and 70s wrote classic “MIT theses”, in which the first few chapters, 
especially, were extremely well structured. 


don't understand what you mean by “accumulate points” — that sounds ominous. 


‘d try to avoid LaTEX if possible — and if not, to just use at the very end. 


Some people can plan and write top down, and some will have a much easier time to 
simply organize ideas to try to find a good sequence. In either case, if you care about 
the writing at all, you will need to put together something that is a kind of narrative. 


One of the ways to do this is to see if you can write a one or two page summary of the 
whole thing, or perhaps to write a slightly longer paper, as it might appear in a 
proceedings. This will provide an overall narrative structure than can be more useful 
than an outline. The two main ways to expand this are to (a) add in details, or (b) keep 
the short paper as the intro chapter, and think of the rest as “appendices” that contain 
details and references. 


€ Alan Kay 

<7 Have designed a few programming languages : Updated 6y 

What are the pros and cons of developing software from the top-down and 
from the bottom-up? 

Somewhat harmonious with some of the other answers: try making a small scratch 
throwaway working model first to get an overall sense of the system design and 
relationships needed. 


This is especially useful when working in a language system that has a huge library of 
already made stuff (which often tempts “creation by searching for features” rather 
than thinking about the actual problem). In other words, it is often helpful to treat a 
library as possible optimizations that might later be useful, but design is usually best if 
optimizations are left out in the first pass. 


This route is most useful and fun if the language and system is amenable to quick 
“sketches”. Too much work and delay leads to hacks and tendencies to retain the 
throwaway model (the chances are low that the first sketches are what you actually 
want to do, so things have to allow the designer/prototyper to be able to move on 
without a feeling of loss). 


Just as an example, “workspace programming" in Smalltalk could have been even 
better, but it allowed a lot to get tried out before making big commitments to class 
relationships, and all the code would be a single “eyeful” in the workspace. One thing 
that helped was that a lot could be done in Smalltalk in a few lines, and this kept the 
experimental code small enough to “be thinkable” rather than just “doable”. 


And, but, ... consider the difficulties involved in taking a completely malleable lump of 
clay and trying to debug it into something beautiful! What's needed is a “vision/sense” 
of what is to be created. (This is why “sketches” need to be thrown away — their 
purpose is to help form the inner senses ...) 


€ Alan Kay 

<1 Have designed a few programming languages - 6y 

What do Alan Kay and other message-oriented programming pioneers 
think about Ruby? 

The real question is: how should programming really be done ca 2017? And let's also 
pick a check point around 2027 to help think about my first question. 


Alan Kay 
‘1 Still trying to learn how to think better - 6y 


Has PARC (formerly Xerox PARC) become just another company? Has it lost 
its original identity? 

Parc was four laboratories: Physics, Optics, Computer Science, and Systems. The last 
two were the most associated with what people think of as “Parc”, and the deeds most 
associated with “Parc” were mostly done between 1971 and 1983 when Bob Taylor and 
most of his lab left (| left in 1980). Some good people remained. And there were lots 
of very good researchers in the other two labs. 


In my mind “Parc” ended when Taylor and his main people left. As you can see, this is 
not completely fair to those who remained. Still, to me the place was about top 
researchers having the resources to carry out their research. When “top-downess” 
gradually was imposed starting in the late 70s, that was the beginning of the end. 


€ Alan Kay 


Still trying to learn how to think better - 6y 


How much cash did Xerox have on hand when it formed Xerox PARC? 


Worth checking. My recollection is that they were about a $6B/year (1970-71 dollars) 
company at the time Parc was set up. 


€A Alan Kay 


Still trying to learn how to think better - 6y 


What kind of cheese makes the best grilled cheese sandwich? Why? 

Kind of redundant given all the answers, but this is the kind of dish that we react to 
personally ... a great grilled cheese sandwich is my favorite comfort food, and my 
notion of it is from the ones my mother used to make for us in New England in the 
40s and NYC in the 50s. 


One of the New England tastes is brown butter on bread, and the rolls made for hot 
dogs and “lobsta” are white on the outside so they can be lavished with butter and 
then browned in a pan. And this is also the case with grilled cheese sandwiches (at 
that time called “toasted cheese sandwiches"). 


The bread back then was “Wonder Bread” (kind of terrible air puff, but redeemed by 
frying in butter), and the cheese was “processed American” or “doubly processed 
American” (i.e. Velveeta). 


My dad was Australian and loved sharp cheddar cheese with apples. Sometimes he 
would part with some cheddar for the grilled cheese sandwiches. Sometimes he would 
pop apple slices into the sandwiches — not bad but to me missing the essential taste. 


It was years before | found out that “bread” could taste good (or even have taste). 


My favorite version 70 years later uses multigrain/nut bread which really grills up 
wonderfully, and the sharp cheddar of the kind my dad loved. Great French baguettes 
are also terrific when grilled. | like to put a plate on top to weigh the sandwich down 
in the pan, and to cook each side just short of burning the butter. so the result is very 
dark and at its tastiest. 


€ Alan Kay 
{Still trying to learn how to think better « 6y 


What is the meaning of life in 5 words? 


“We make our own meanings” 


Alan Kay 
{Worked at Apple (company) - Updated 4y 


What was it like to be at Xerox PARC when Steve Jobs visited? 


A good enough answer would be longer than is reasonable for Quora, but | can 
supply a few comments to highlight just how little attention is paid in the media, 
histories, and by most people to find out what actually happened. For example, | was 
present at the visit and demo, and it was the work of my group and myself that Steve 
saw, yet the Quora question is the first time that anyone has asked me what 
happened. (Worth pondering that interesting fact!) 


First, it's worth understanding that many people (perhaps even a thousand or more) 
had seen live demos of the Alto and Smalltalk before Steve. This is because Steve 
showed up in 1979, and the Alto and Smalltalk had been running for 6 years (starting 
in the first half of 1973), and we were a relatively open lab for visiting colleagues and 
other interested people (like Herbie Hancock and Al Gore). 


Many more people had read articles that I'd written (e.g. in Scientific American, Sept 
1977), and one with Adele Goldberg (in IEEE Computer March 1977). These included 
many screen shots and a wide variety of Smalltalk media. For example, here is the 
provocative title of the 1977 SciAm article: 


Microelectronics 
and the Personal Computer 


Rates of progress in microelectronics suggest chat in abour a decade 
erany people will possess a norebook-size comporer with the capacity 
of a lagge compurer of today. Whar might such a system do for them? 


by Alan C Kay 


And a picture of the Alto being used by children: 


ER PERENTAL FERGIE 41 COREUTER mee heme ot ee Be ere eet ce barman PB) wer hers + cmmghte es eal = oe temwed, 
HOt Fete AP Remarc Conte ts pat be dereep t bak Grrl ger | anennag ats tepmeent + pemting Areee + Rh senenmon partee 
ee i i oe myer ond 4 pred trtem eB tmereried & 4 ema prevreens mt 
ceed progremn The cathe and ha refieager wre: che interrewd «= #4 + On aEe Seem memory Bapmy Fee pen ent Mhemnaety 
ag OS fpr ewes semper te ety te rem of presses — ol rharestery appeeasting (hs emay of Gees is gered meee al 


And a picture of the Alto screen and caption from the SciAm article in 1977 (SciAm did 


a bad scan of this, if | have time I'll find the physical issue and do a better scan): 


wy ay 


\ vl / 
ele 


Bere de rete | 


Aetnacnes— 


Se em M tre 
cm Weteleee Peepeed Eee Adie Geter tee 

hod 
wren — - | 

SR ee ee tte 

Pte Ai OA tO 

my 

fe the Chedet of poti-s lat-riesdmaal te ces | 
(aati 
ROX 

eee 


WINDOW A” dingday tremens whthin the larger daples werrre, owshiy dhe wer te orpesre end 
#00 bal cemaboe wt ety eral Gevebs of Pohnem eet Ques the = iedews ant coeuted they ev eutap ce 
er werees Bie shree of paper, bre © parted covered wistee & oriented moat he pewtng 
dreere, (wie re & eedeptegy ed & corre ft ofr wade leer «ht vee écgren 
ef eye bede ewebest tae be daplay ed comPinermaly. beh image: etlade drluibed badder 
dramrn gy emabe give! (mages terk os greghs 40d tow beds images wack os camber or words 


The readership of SciAm in those days was about 2.5 million, so a lot of people saw 
this. 


A second important fact about the 1979 demo to Steve, was that he missed most of 
what we showed him. More than 15 years later he admits this in this interview: How 
Steve Jobs got the ideas of GUI from XEROX © where he says that we showed him 
three things but he was so blinded by the first one (the GUI) that he missed both 
networking and real object-oriented systems programming. (A fun part of this is that 
Steve, after praising the GUI to the skies, realizes what he’s saying and immediately 
says “but it was flawed and incomplete”, etc. This was his way of trying to be “top gun" 
when in a room where he wasn't the smartest person.) 


And, actually, he missed a few things about the GUI (for example, that it had unlimited 
and persistent “desktops” which could be used to sustain work/projects over time 
without having to tear down and build up, and without stovepiped apps, etc.) 


The demo itself was fun to watch — basically a tag team of Dan Ingalls and Larry 
Tesler showing many kinds of things to Steve and the several Apple people he brought 
with him. One of Steve's ways to feel in control was to object to things that were 
actually OK, and he did this a few times — but in each case Dan and Larry were able to 
make the changes to meet the objections on the fly because Smalltalk was not only 
the most advanced programming language of its time, it was also live at every level, 
and no change required more than 1/4 second to take effect. 


One objection was that the text scrolling was line by line and Steve said “Can't this be 
smooth?”. In a few seconds Dan made the change. Another more interesting objection 
was to the complementation of the text that was used (as today) to indicate a 
selection. Steve said "Can't that be an outline?”. Standing in the back of the room, | 
held my breath a bit (this seemed hard to fix on the fly). But again, Dan Ingalls 
instantly saw a very clever way to do this (by selecting the text as usual, then doing 
this again with the selection displaced by a few pixels — this left a dark outline around 
the selection and made the interior clear). Again this was done in a few seconds, and 
voila! 


The Smalltalk used in this demo was my personal favorite (-78) that was done for the 
first portable computer (The Parc Notetaker), but also ran on the more powerful 
Dorado computer. For a fun “Christmas project” in 2014, several of us (with Dan Ingalls 
and Bert Freudenburg doing the heavy lifting) got a version of this going (it had been 
saved from a disk pack that Xerox had thrown away). 


| was able to use this rescued version to make all the visuals for a tribute to Ted 
Nelson without any new capabilities required. The main difference in the tribute is that 
the revived version had much more RAM to work with, and this allowed more bit-map 
images to be used. This is on YouTube, and it might be interesting for readers to see 
what this system could do in 1978-79. 


é Alan Kay 
“1 Still trying to learn how to think better - 6y 
Who was the first to introduce multiple desktops, Linux or macOS? 


| hate to bring up the “P” word, but Smalltalk at Parc had unlimited persistent multiple 
desktops starting in the mid-70s with Smalltalk-76. They were called “projects”. All 
Smalltalk systems since have this as part of the system's offerings. 


You can see a revival of a Smalltalk-78 system in the tribute video | did for Ted Nelson 
a few years ago: Alan Kay's tribute to Ted Nelson at "Intertwingled" Fest & . | did all 
the visuals for the tribute in Smalltalk-78. 


The multiple desktops are used heavily (note that besides supplying multiple 
workplaces, they also create a much richer media environment than apps like 
PowerPoint, because in Smalltalk you can bring any and all objects to each “place” and 
integrate them — and they are “live” not canned). 


| tried for many years to get Apple to adopt this (and a few other good ideas from the 
60s and 70s) but to no avail. 


€ Alan Kay 
\— 1 Worked at Hewlett-Packard (company) : Updated 6y 
What is your review of The HP Way (1995 book)? 


Interesting that Quora will not allow me to write an additional answer ... so I'll flesh 
out what | wrote previously, which starts with: 


I'm currently traveling, and would like to take a peek at the book to make sure | 
remember it from getting and reading it when it first came out — so I'll postpone much 
of the review until | get back and can find it in my library. 


| couldn't find the book (from 1995) in my library so | ordered the paperback version 
that was published in 2005, which included several new highly valuable additions, 
including an excellent foreword by Jim Collins (author of the book Built To Last) that 
summarizes much of what is important about “The HP Way”, and a transcript of a 
speech Dave Packard gave to the HP managers in 1960. Both of these are worth the 
price of the book. In the previous note | then picked up with: 


But | can say that HP in the 60s and 70s was certainly one of my all time favorite 
companies, and | visited quite a number of times when | was a grad student and later at 
Parc. | knew Bill very slightly, did not know Dave, and was friendly with a number of 
other HP luminaries (such as Tom Osborne, one of the main forces behind the HP-35 
“pocket computer”). 


What was impressive to me about HP initially was the large number and range of 
extremely high quality products (eventually thousands). Later, | was fascinated by the 
“HP process” of “intrapreneuring” — how they were able to invent and also take 
products to market. Still later, during and after the Xerox experience, | was -really- 
impressed with their process. 


The main body of the book has much of historical interest — not just the rise of the 
company, but how they thought about what kind of a company it should be and 
become, how it should grow, and much of interest about growing pains, and a 
number of the routes they decided to take to deal with the immense scale-up that the 
company created. Readers will find much in common with some of my other Quora 
answers about the ARPA/Parc research community. 


However — and especially given the difficulties and absences of today’s funding for 
long-range research — what should be of most interest, is how HP was able to fund 
and do a considerable amount of research internally and combine this with its 
amazing “intrapreneurial” processes — not just “innovation” to take products into 
manufacturing and to market, but do this in an atmosphere of thousands of different 
kinds of competing products. The Collins book is a bit more useful than “The HP Way” 


on this score: Built to Last: Successful Habits of Visionary Companies (Harper Business 
Essentials): Jim Collins, Jerry |. Porras: 9780060516406: Amazon.com: Books % 


Another very useful book for those interested in these kinds of processes is:The HP 
Phenomenon: Innovation and Business Transformation: Charles H. House, Raymond L. 
Price: 9780804752862: Amazon.com: Books @ . In many ways, this perspective on the 
HP process from one of its outstanding engineers (Chuck House was one of the most 
productive, and a wonderful character besides). He was also involved in some of the 
aspects of the following story from my first note: 


While at Parc, we were also interested in what HP was doing with its “21MX" computers 
(the first HP computers to use DRAM memories). The two most interesting properties (a) 
very fast microcycle time (similar to the Alto), lots cheaper, and a real product (b) that 
everything important about the Alto was quite missed, including what the display and 
UI should be like, what kind of software and applications, etc should be run. That 
contrast has stayed with me (it’s another one of those “Context is worth 80 IQ points” 
situations). Basically, they already had a mini computer without a real graphics display, 
and they stuck with this when the revolution of cheap DRAM and other integrated 
circuits happened. In theory, they should have been the company to do the personal 
computer revolution — consider the HP-35! — but quite missed it. 


| and my research group did work for HP — ca 2003-2006 — but this was a completely 
different company which just had the same name — and this experience doesn't have 
anything to do with Packard's book. 


I think a good way to finish this review would be to specifically refer the interested 
reader to page xvii of Collins’ foreword, and his observations about the difference 
between understanding the principles — and applying them — vs labeling some of 
the resulting processes and trying to extract further good from them (my analogy 
would be the difference between the ideas of a benevolent founder of a religion and 
the reification of the “trappings” into rituals that mostly quite miss the original 
intentions). 


é Alan Kay 

<7 Iam the Alan Kay in question. - 6y 

Why did Alan Kay say MS-DOS was a distressing thing? 

| believe that you can praise in a few sentences, but that criticism morally demands 
longer and more careful arguments. | don't know how to be fair about a criticism on 


this subject in a Quora-sized answer, but | can point you in a few directions to see 
what I'm driving at. 


First, you can learn a lot by looking at the states of the art of OS designs in the 60s 
and 70s. (Here there are some that are easy to praise.) For example, the Burroughs 
B5000/5500 MCP OS designs starting in the very early 60s. 


Another one of my favorites from the mid-60s is the "Project Genie” (later SDS 940) 
OS mostly designed by Butler Lampson (this was the OS used on the famous 
Engelbart “mother of all demos"). 


The view of Unix of Doug Mcllroy (and the original notion of how a tiny kernel should 
be able to be extended by user processes) ca 1972 or so. 


And, | also liked the “it’s all a live OOP language down to the metal” approach we 
used for Smalltalk at Xerox Parc. 


Worth notable mention for small machine OSs were certain aspects of the Amiga OS. 
The above is not an exhaustive list, but it’s a starting place. 


You should also look at the progression and dates of both the bottom (kernel) and the 
top (Ul) of the MS operating systems, and what features they tried to add, and when. 
And, it's worth noting the large change outs along the way — for example, what were 
the needs besides simple scaling for NT (which was more like a “real OS" albeit an old 
mainframe type design)? 


The basic idea here is that in software design, at worst should be reinventing the 
wheel (this is sometimes just right), but “reinventing the flat tire” should be criticized 
heavily. 


There have been even more severe cases in software of people “reinventing the 
broken wheel” that is, producing an artifact that can't really be seen as a wheel and 
where it is not clear how to go about reshaping it into a wheel. (This is hard to do after 
one has taken the time and put forth the effort to learn about wheels, and how to 
make them.) 


Another way to look at it is that very few people are as smart and capable as Butler 
Lampson (or Bob Barton and the B5000 folks). Why not start with “the best so far”? 


é Alan Kay 
1 Still trying to learn how to think better - 6y 


How can life form from nothing? 


There are four parts to this. 


1. The easy part is that many constructed things have very different properties than 
their parts. For example, we can start off with materials that don't do what wheels do, 
and yet make a wheel, make a car, make a radio, etc. 


2. The medium part is that from many different kinds of materials — from water and 
gas valves, to doorbells, and even Tinkertoys — we can first — easily — make a 
universal logic gate — and then by just organizing these in different ways we can 
make a programmable computer that do anything any computer can do including 
looking at and writing its own programs. 


3. A fun part is that the analysis of living stuff indicates it is all composed of a few 
kinds of atoms. Molecular biology over the last 100 years or so has found out many 
(most) of the organizations of atoms needed to make the different kinds of molecules 
needed for life. “What Life is” is quite well understood. 


There are many useful analogies to the much easier to understand levels of 
architecture of computer stuff that give rise to very complex systems (and systems of 
systems) etc. This is worth understanding, and then taking a look at the molecular 
mechanisms that even simple life today uses. There is a big jump in scale from large 
non-living molecules and what we call “life” (but no new ingredients). 


4. The breathtaking part is how the atoms available combined with various energy 
sources and non-living matter might have bootstrapped structures that could 
reproduce just well enough to be subject to the further random accidents and stress- 
that-filters of the environment which is what “evolution” is mostly about. 


| say might because where things are right now are still not strong enough to induce 
from scratch. It's important to realize that this is possible but just how probable is not 
understood at this point. Part of the difficulty is that the earliest life forms would 
almost certainly not be DNA-based, and would lack many other characteristics we are 
used to seeing in life forms today. So the search is really for something that would 
work at all, that could then randomly change, sometimes into something more 
capable and efficient. 


There are a number of very good people who have been looking that these questions 
over many decades. One of my favorites is the Nobel Prize winner Jack W. Szostak - 
Wikipedia @ Take a look at his work (and he also has some excellent exploratory 
videos on YouTube). 


€ Alan Kay 
“7 Invented and designed a few user interfaces ... - 6y 


Have screenshots of the Xerox Parc GUI ever been published? 


Sure, here’s one of Smalltalk-80!"! and here's a much earlier one from Smalltalk-74!2! 
Here's an image of Smalltalk-76!°! from a paper Dan Ingalls wrote in 1978. 


Here's a Dorado screen with one of the Smalltalks (maybe Smalltalk-78)'4). There are 
quite a few of these to be found in Google images. 


Footnotes 

[1] Redirect Notice @ 
[2] Redirect Notice % 
[3] Redirect Notice 7 


[4] Redirect Notice 7 


G& Alan Kay 
“7 Still trying to learn how to think better - 6y 
Why haven't we been able to recreate Bell Labs or Xerox PARC today? 


The computer parts of Parc are best thought of as parts of the ARPA-IPTO research 
community, were stocked with researchers who carried the ARPA vision, and who 
learned their stuff and process in the ARPA community. Looking back on that 
experience, and at other research setups over the last 50+ years or so, | think that “the 
goodness of the results correlates most strongly with the goodness of the funders". 


For ARPA, it was a combination of a great, good, compelling, and romantic vision of a 
“destiny”, combined with a number of principles (I've written about these on Quora 
before), and especially the idea of finding the best people to try to find ways to realize 
the vision. 


Bell Labs was a different setup and process, but a good overlap was indicated in a sign 
that used to be in many places in Murray Hill: “Either do something very useful or very 
beautiful’. 


| think the chances of a few top researchers being around in any era are good, but it 
has certainly been the case that “great funders” are -not- around in every era. 


Much of “management” is (a) about dealing with people who are not self-driven to 


-find-out- and (b) to strive to carry out plans successfully and on deadline. 


On the other hand, it’s the nature of “invention level research” that only some of the 
process can be planned. In my experience great researchers are quite self-driven to 
-find-out- (they don't need much management), but they do need open-ended time 
(and a little space and resources). 


Good funding for research is rather like "MacArthur grants for groups” — researchers 
are a kind of artist — and what doesn’t work out is not failure, but overhead for what 
does. If the visions were lofty, then everything is lifted by orders of magnitude by what 
does work out. 


This very different process makes most “command and control” management systems 
in business and government today quite nervous and anxious. | think they would 
much rather feel completely in control of mediocre processes than feel out of control 
with truly great processes. 


To make this explanation more complete (or more mysterious) we need to mention 
top scientific research institutions — such as the Santa Fe Institute — and note that 
they do exist today, and are organized very much as | have described Xerox Parc. Just 
last week appeared a magisterial book for the intelligent public — SCALE — by 
Geoffrey West of the Sante Fe Institute. Amongst the many important expositions of 
the central subject, there is also an Afterword, which contains a section on how the 
Santa Fe institute was set up, the general nature of its processes, and some of the 
influences on its design. 


The mysterious part seems to have something to do with “real science” and what 
much of computing seems to be about today, and what investors in computing are 
interested in. Parc was very much a “computer science” research institution in the old 
meaning of the term (where the word “science” was taken to be like the existing 
serious sciences). Perhaps what has lapsed in the field since then is part of the answer 
to this most puzzling question. 


€ Alan Kay 
h 1 Have designed a few programming languages - 6y 


7%) 


What's Alan Kay's opinion of the D programming language? 


| don't think | know enough about D to have a supported opinion. I'm not a fan of C 
or C++ (but not because they aren't like Smalltalk ...) 


€A Alan Kay 


Have designed a few programming languages - 6y 


How did Ivan Sutherland create Sketchpad? What programming language 
did he use? 


Take a look at Ivan's “Commemorative Lecture” for the Kyoto Prize for some insight 
into his special mind. Everyone should read Ivan’s thesis — Sketchpad, A Man-Machine 
Communication System (1963) — one of the greats of all time, and it also answers your 
question. 


Most programming in 1962 was done in machine code, despite a few somewhat 
higher level languages starting to work (e.g. Fortran and the very recent Algol). 
However, these languages were mostly to clean up the kinds of programming done in 
the 50s, and did not offer much to help build “semantic structures”. COBOL had a few 
more ideas but was tied very much to “unit records” and was quite ungainly to 
program in. 


Most of the good programming in “assembly code” back then was done in macro- 
assemblers, some of considerable power along a variety of dimensions (for example 
IBM's “Autocoder" allowed conditional expansion of macros to optimize general 
routines for specific kinds of parameters). 


The underlying structures of Sketchpad came primarily from what was sometimes 
called “data-driven” programming (where pointers to subroutines useful to handle the 
data were embedded in the data). The MIT versions of this can be traced to Doug 
Ross, who had done some of the earliest computer graphic renderings (such as 
bouncing balls, etc.) at the Whirlwind project in the 50s, and was instrumental in the 
APT projects (Automatically Programmed Tools) which required quite a bit of 
structured graphics to represent the 2D and 3D shapes. This and the programming 
difficulties led Ross and his group to develop (Algol Extended for Design - AED). 


Meanwhile, Lincoln Labs had the very large TX-2 computer with an architecture that 
allowed easy multiprocessing — via multiple program counters etc. Among other 
things this allowed a programmer to make I/O gear with a minimum of hardware, 
most of which would be simulated via programs on the program counters. (We used 
this great idea heavily later for the Parc Alto.) 


The TX-2 had a large oscilloscope screen and an instruction that would plot a point on 
it. This took 20 bits of the 36 bit TX-2 instruction. More of the bits were used to hold 
an address to jump to if the light pen saw the dot that was just plotted. In Sketchpad, 
about 1/2 of the enormous machine’s memory and capacity was used to hold the 


instructions that generated the dots and carried the back pointers for the light pen 
(these would be shuffled to “twinkle” instead of the more distracting “swooping” when 
in order). All graphics beyond the point plot was done by software by Ivan. 


The programming of Sketchpad was done in assembly code and soon the macro 
assembly language called CORAL (Class Oriented Ring Associated Language), which 
was primarily the work of Ivan’s office mate at Lincoln, Larry Roberts (who later was 
the director of ARPA-IPTO and was the force behind the actual building of the 
ARPAnet). [1] 


One of the most interesting properties of this language was the support it had for 
“threaded lists” that were used to hold collections with two way relations. So for 
example, one would like to have a collection of all objects, but when encountering an 
object one would like to find out what collections it belongs to. For example, the light 
pen can see a point, the point knows which line it is in, the line knows which construct 
it is in, the construct knows what construct it is in, etc. 


Threaded lists were a very tricky way to do this, and it wasn’t too long before other 
ways were tried (note that it is still an issue in most programming languages today 
which lack associative functionality). 


The other property that CORAL made easier was getting to the embedded procedure 
pointers — for example, the pointer to the procedure that could draw the object on 
the screen was located at the same offset from the front of the “n-component- 
element" (as Ross called the storage structures). | think you can see that this 
anticipated most of what the “not-quite-object-oriented-languages” like Simula (and 
later C++) were about — what was lacking was messaging and encapsulation — but 
the core ideas, especially in Sketchpad, led to the more general “real object” notions. 
(Sketchpad and Simula cannot be praised too highly.) 


Finally, there was the problem of dynamic constraint solving. Some constraints were 
easy to solve (e.g. this endpoint of a line is connected to that endpoint of another 
line), some could be solved by solving linear equations (e.g. using Gaussian 
elimination), and some were n-dimensional non-linear, and were handled by a 
“relaxation algorithm” Ivan got from an old book from the turn of the 20th century. 
This did n-dimensional hill climbing to minimize overall errors — this is a wonderful 
“engineering solution” where “close tolerance” was good enough (if something is off 
by a pixel it doesn't matter, etc). The particular way lvan chose to do the solving would 
generate the intermediate solutions — the drawings would crawl to the solved states 
— and if a system was underconstrained (such as a crankshaft) Sketchpad would 
generate all the solutions (and this would animate the crankshaft). 


Pretty staggering for less than one year's work on a PhD thesis isn’t it?!!! 
Now go read his thesis! 


Footnotes 


[1] HOPL 


.\ Alan Kay 


‘1am the Alan Kay in question. - 6y 


q 


7% 


What did Claude Shannon do and accomplish according to Alan Kay? 


The Wikipedia article is a good place to start, and supplies references for reading by 
and about Claude Shannon, in my opinion a “real genius” who ranked with just a few 
people in the 20th century for unique insights and ability to go to great depths with 
them. 


That “Boolean Algebra" from the 19th century could be used to model and manipulate 
switching problems — including computation — formed perhaps the most wonderful 
Masters degree (from MIT) in the 20th century. The insight that this might be possible 
apparently first occurred to Charles Sanders Peirce in the 1800s, but got lost, and 
Shannon took the possibilities very deep. 


Getting “signal” back from “signal mixed with noise” was perhaps Shannon's most 
profound contribution, and his mathematical treatment of the general idea and its 
problems has had great impact both philosophically (it is a twin sibling of “entropy” in 
the physical world) and pragmatically (virtually all methods for transmission of 
information used today own much to the general ideas and their implications). 
Shannon was able to create definitions of “information” (and founded the field of 
“Information Theory") that are fundamental. 


There are many useful perspectives on these ideas. A good start is to read the book by 
Shannon which has a simpler introduction by Warren Weaver: The Mathematical 
Theory of Communication: Claude E. Shannon, Warren Weaver: 9780252725487: 
Amazon.com: Books 


My favorite way to think about this is from a constructive point of view. One way to 
construe the result is: If you can state the signal to noise ratio, no matter how bad, you 
are guaranteed to be able to make a system that can recover the signal from the 
noise. (Where “system” is not just for transmitting information, but really entails any 


kind of machinery.) 


This result is against most of our intuitions, and | remember being just knocked over 
by the coolness and profundity of it when | first went through the argument as a 
teenager. 


Alan Kay 
<< 1am the Alan Kay in question. - Updated Sep 10 


When did Alan Kay first meet Douglas Engelbart? Was Alan Kay present at 
the mother of all demos? 


first met Doug when he gave a talk at Utah in early 1967. As I've said, he was like a 
“Moses opening the Red Sea”. | then visited the ARC research project at SRI in Menlo 
Park and met many of the principal characters in the revolution they were fomenting: 
especially Bill English, Jeff Rulifson, Bill Paxton, etc. 


got to use the system, and also observe some of the prep for the “mother of all 
demos”. So | knew what was going to be shown, and | absolutely wanted to see the 
actual gig. | had the flu at the time but went anyway. 


t is hard to convey to people the actual scope of the demo: it was done in a very large 
auditorium, and it used one of the few kinds of video projectors available (borrowed 
from NASA) — that could do very large situation displays with high resolution video. It 
was great! The video of the presentation doesn't provide the sense of “large” that was 
one of the most striking things about this presentation. 


The “mother of all demos” video and the ARC papers can be found by typing 
“Engelbart” into Google ... 


| see there is now a chatBOT answer to this question — and it is pretty much all 
wrong. (Quora, why do something this poorly thought through?) 


@ 


For Alan Kay, what was Claude Shannon like? 


Alan Kay 
| am the Alan Kay in question. - 6y 


| didn't know Claude Shannon (but among other things he was Ivan Sutherland's 
advisor at MIT (in part because he met the Sutherland brothers a number of times 
when they were youngsters when he was at Bell Labs). Marvin Minsky was a good 
friend of Shannon's. 


| know that he like to ride his unicycle around MIT, and also was very good at making 
complex things with his hands. 


A better question for Quora might be: "What did Claude Shannon do and 
accomplish"? 


€ Alan Kay 
<1 Still trying to learn how to think better - 6y 


How many of the great engineers from Xerox PARC came from 
backgrounds with low family incomes? 


| don’t know (and most of the researchers at Parc were a mix of “scientists, engineers, 
mathematicians, computerists”. 


And I'm not sure what you mean by “low family incomes” (desperate? very little 
surplus?). | wound up working my way through college for a variety of reasons, 
including “not a lot of surplus” in our family. 


€ Alan Kay 
‘1 Still trying to learn how to think better - 6y 


Alan Kay, how should libraries be designed so scientists and engineers can 
work together more? 

This “isn't quite a question” so here is “not quite an answer” — Doug Engelbart, who 
was a great thinker about the possibilities for helping humanity with the aid of shared 
interactive computing, wrote much and showed much about needs and ways to boost 
collaborative endeavors partly through shared “libraries” of knowledge. 


Try typing “Engelbart” into Google and follow the trail 


€ Alan Kay 
7 Lam the Alan Kay in question. - 6y 
Alan Kay: How do you identify favorable exponentials? 


I think this question happened because | used “Moore's Law” as an example of a 
“favorable exponential” that could be used to help think about “progress by going out 
into the future and bringing it back". 


Most of the time I've first noted exponentials, and then tried to see what leverage 
orders of magnitude out would manifest. One of the simplest is the cumulative effect 
of scientific knowledge, which has been both exponential and favorable. The one that 
makes Moore's Law work — and many other things — is the reach and power of 
engineering at all levels of scale from nano on up. Quite a few of what's been newly 
doable just in the last 20 years just staggers me (for example, when | was studying 
molecular biology in the early sixties, gene sequencing seemed likely to be intractable 


a. 


The combination and intertwining of the science and engineering exponentials is 
“hyper-exponential”. 


é Alan Kay 
“7 Still trying to learn how to think better - 6y 


Why was Bob Taylor fired from Xerox in 1983? 


There were large “differences” with the new head of Parc. | had already left by then 
and at that time was Chief Scientist of Atari. 


€ Alan Kay 
*- 1 Lam the Alan Kay in question. - 6y 


For Alan Kay, if the present is distracting, then how should you optimally 
ignore it? What did the team do at Xerox PARC? 

Originally Answered: Alan Kay: If the present is distracting, how should you optimally ignore it? 
What did the team do at Xerox PARC? 

By coincidence, | got asked to do several lectures (this week) at Stanford on “Escaping 
The Tyranny Of The Present”. | think the one from Tuesday just got posted (and 
perhaps even the one from today). 


Take a look at: How to Invent the Future | - Alan Kay @ 


€ Alan Kay 
Still trying to learn how to think better - 6y 


What did JCR Licklider mean when he said if the internet succeeds our 
biggest problem will be to communicate with aliens? 


Lick basically meant that, as scaling increases, the chances to have the shared contexts 
that communication requires get thinner and scarcer. 


This is true between humans and humans (at some point you have to stop trying to 
send telegrams and send an ambassador who can negotiate meanings), computers 
and computers, software and softare, and humans with computers and software. 


All combinations start to be “communicating with aliens". 


Somewhere else in Quora | gave some other examples e.g. of what “objects that can 
search & negotiate” could do to resolve lack of specific knowledge about APIs and 
object interfaces. 


In the end we were able to give Lick his "Intergalactic Network" but despite work at 
Parc, UCLA, MIT, etc. in the 70s and 80s we didn't manage to pull off solutions to the 
general problems. These are still open and more needed than ever. 


€ Alan Kay 


Still trying to learn how to think better - Updated 2y 


What do classical music lovers think of rap? 


| really enjoyed all of the other answers — perhaps my favorite collection of responses 
to date on Quora. 


Part of my background is 9 years of being a professional jazz musician (on guitar, 
starting in the late 50s), composing experience, many decades to date as an amateur 
classical pipe organist, and 2019 to the present as a Renaissance and baroque lutenist. 


One of the prime characteristics of serious art is to wake us up to pay attention to 
something important we've been missing. From that standpoint, I'd say that some rap 
is Art. 


There are so many kinds of music in the world that we have to think about it as 
“organized noise” (including “no-noise”, as in John Cage's "4'33"). From this view, we 
have to view rap as Music. 


At the next level we can look at the contrasts between “pop music” and “developed 
music” (which would include both classical and jazz). There's not a lot of development 
in either the words or the other sounds in rap (but there is some). 


But the Zappa view (which is expressed in another answer) that rap is a form of 
American poetry, seems really good to me. Another answer, which also looked at this 
from the point of view of Homeric and other bards singing or semi-singing their lyrics 
as part of the same genre as rap, seems quite on point. 


And — there is “form” and there is “content”, and they aren't completely separable. 
Form without Content is like Technique without Art — these are chilly and worrisome. 


| like music that simultaneously affects us directly without analysis for one part of our 
brains and has nuances within nuances to be savored by other parts of our brains. 
Both going on at once is what makes life worth living. Both require some learning, and 
the second part can require a lot of learning and training (some really requires being 
able to play the music to touch more of it). 


I think rap could develop into an artform like this — this would take much of it out of 
the popular highly accessible place it is now. What I've heard so far isn't there yet. 


This development of a pop form has happened with other music that started out as 
pop — not just jazz, but some rock and roll. 


Perhaps the big question is will most who like rap now be interested in getting more 
sophisticated and developed as the art form develops? 


€ Alan Kay 
“1 Still trying to learn how to think better - 6y 


What is the most expressive piece of art that you have ever seen? 


It's worth asking to what extent “art” can be sorted, even by a single person. | think 

the latter can at least create a category called “great” and include works in it (this is 

harder for a group to do, but it's possible for them to agree on some to be put in as 
“great”). 


On the other hand, | don’t think it's really possible to sort to a single “most X" — e.g. 
partly via what? And partly because a goodly portion of expressiveness has to do with 
the transient moods of the observer. 


So | don’t think we can do much sorting between e.g. Bach and Beethoven, 
Michelangelo and Rodin, Raphael and Vermeer, etc. 


é Alan Kay 
<7 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 6y 


Why does imperfect computation have market power? 


Why does imperfect music — even bad music — have market power? 


é Alan Kay 
“~~ Still trying to learn how to think better - 6y 


What's the point of life if we all die? 


The point of life is to live life with a point. We are the creatures who are aware that 
we'll die, but we also are the creatures who create our own reasons for living. Don't 
worry about the universe. Find a point. 


€ Alan Kay 
<1 Still trying to learn how to think better - 6y 


Why is squaring both sides of an equation allowed? 


Let me confess that I've not read every answer. But the ones that | did read were all 
very concerned with “squaring” etc. 


The simplest answer — and | think the reason many people have difficulty with both 
arithmetic and especially algebra — is that you need to deeply internalize just what 
the "=" sign symbolizes and asserts: that there is the very same number on each side. 


In other words don't be distracted by the symbols and operations. One way to think 
about this is that “a number is all the ways you can make it” (i.e. it can be thought of 
as “processes” (an infinite number of them) as well as a “value”). 


wn 


This means whatever you can do to any number can be done on both sides of the 
because there is just the same number underneath the gobblydegook on both sides. 


This is what "=" actually means. And it’s why algebra is actually quite easy rather than 
mysterious or difficult. 


€ Alan Kay 
<7 Invented and designed a few user interfaces ... - 6y 
What is the user interface of next generation messenger? 


Hi Jenifa Could you try asking your question in a different way? This will help me 
understand what you would like to know ... 


€A Alan Kay 


| read a lot - Updated 2y 


What are the best books to understand computer science concepts? 


By far the best book for the general public — the one | always recommend — is “The 


Pattern on the Stone” by Danny Hillis. 


| have several thousand books about computation, but | can’t think of a great next 
level book. There must be one by now, so it will be interesting to see what other 
people who answer this question have to say. Meanwhile, I'll poke around and see 
what | can find for you. 


One way to approach this is as an analogy to atoms -> molecules -> life (physics, 
chemistry, biology). The easiest things to learn about computing are at the atomic and 
simple chemistry level, both in terms of organizing simple “switchable switches” to do 
more complex things, and also via simple programming. 


The key is to find ways to make the jump from “chemistry of large structures” to “life” 
— that is, to qualitatively different and more profound organizations of the basic 
materials. 


There is a fair amount of evidence that many of today's computerists don't find their 
way to this. But many of the most interesting things in computation — such as the 
Internet, etc — use organizational principles that are much more like “biology” than 
“chemistry”. 


Added Dec 2020, thanks to Robert Jain’s comment below: Hi Alan, what do you think 
about The Elements of Computing Systems - Building a Modern Computer from First 
Principles by Noam Nisan @ and Code by Charles Petzold % ? Kindly revert. 


I've written about the Nisan and Schocken book in other Quora answers. Basically: it 
can't be too highly praised for the aim, reach, and effort. 


Personally, | would do a few things differently — not as a complaint, but just from a 
different perspective — especially what kind of software to manifest on the HW, and 
perhaps a slightly different HW design, perhaps with completely concrete first 
principles (for example, like the start of the “Code” book. 


One of the many great strengths of N&S is that they furnish excellent simulators for 
every stage of the building process from first principles. And this means that one can 
learn by doing and debugging every step of the way. 


The Code book basically uses a very good approach for “principles” — | would take 
this path also — but | feel that it has way too many words and a few too many side 
trips for a starter book. That said, | think doing the first principles part of this journey 
using batteries, bulbs, and relays as Petzold did is exactly the right way to go. 


A large part of “User Interface” — on computers, or just writing or teaching — has to 
deal successfully with the limited number of ideas we can hold in our minds 
simultaneously (for simple things 7+2, more generally more like 443). Danny does this 
beautifully in his short clear book. “Code” violates this | think, so it works better as a 
secondary reference. The N&S book is large but rewards effort by being organized 
into careful sections that do respect our cognitive limitations. 


| think there could be an interesting and rewarding organization of the best methods 
of each of these books. 


€ Alan Kay 

<1 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 6y 

If an alien civilisation came to Earth and found computers, how long would 
they take to figure out exactly how they work (reverse engineer them)? 

| agree with Vivian (| was just in the process of formulating something similar, but she 
said it shorter and better). 


On the other hand — not quite an answer — every computer person should be aware 
of what the Silverman brothers have done in making old hardware architectures work 
again. One of their most remarkable feats was to reverse engineer a 6502 CPU chip 
and to simulate it in several illuminating ways.Digging into Technology's Past 


€ Alan Kay 
Have designed a few programming languages - 6y 


If you want to program and you can't use your smartphone and computer 
how you can code? 
Programming is relatively easy, it's debugging that's the problem. Or as Tony Hoare 
once noted “Programming is easier than debugging, so don't use all your cleverness 
writing the program!” 


started in the early 60s (in the Air Force), on two different main frames which ran 
batch jobs 24 hours a day. Virtually all programs were written in assembly code. This 
was quite common almost everywhere. 


Programmers were the cheap resource — in the AF, they were enlisted ranks — and 
golden machine time was not “wasted” on us. So once a day we could get one of the 


computers and an operator (we couldn't touch the HW) for 3 minutes to run the 
program a few times and to get a binary memory printout of the after battle carnage 
plus register contents. 


The remedy was to “desk-check" and to write and use tracing macros. The results were 
very few bugs by today’s standards (at the cost of wakeups in the middle of the night 
when your “computer brain” had found a bug in your code). 


There's a wonderful true Don Knuth story that happened because he learned to 
program at the same time and in the same fashion..... 


This style of really thinking the code through is actually good to learn how to do even 
with today’s interactive conveniences (it is analogous to still learning how to 
remember what you read despite having the written record — it makes enormous 
differences in both efficiency and reach). 


Finally, as suggested in some of the other answers, you can get computing power 
today at the level of our wildest dreams in the early 60s for much less than a pair of 
most shoes. Make the investment, and then train your mind to think and do in terms 
of processes. 


Alan Kay 
Still trying to learn how to think better - 6y 


What does the expression "Betcha by Golly Wow" mean? 


“Betcha” comes from “you betcha” which comes from “you bet” which means “YES!". 
“By golly” is one of many ways to say “By god" if you want to avoid really swearing. 
"Wow" means “Holy shit!” means “that's amazing” 

The whole phrase is an hyperbolic exclamation. 


Alan Kay 
Studied at University of Colorado Boulder - 6y 


How is studying undergraduate computer science at CU Boulder? 
They didn’t have it when | was there (graduated in 66 with a math degree) 


€ Alan Kay 
‘I'm the "Alan Kay" in question (try Google for the usual misinformation) - 6y 


How can | come up with an ISEF project relating to Computer Science? 


I'm not a fan of competitions where art is a big factor (and it should be here, not just 
for your learning and doing, but also to carry with you later). 


And ... it's usually much much better to do things that really intrigue you that also 
push into computing “just because”. When something good happens, you might also 
have a candidate for a science fair. 


An experience that made a big impression on me when | was still a student but a bit 
older than you, was to come in contact with some of the fundamental relationships 
that have to do with programming. In my case, there were a number of these — two 
biggies were “Meta II" by Val Schorre, and Lisp 1.5 (page 13) by John McCarthy. Both 
of these were tiny (~ 30 lines of code) self-definitions of systems that had an 
enormous amount of expressive power (the former was a self-bootstrapping 
compiler-compiler, the latter was "The Maxwell's Equations of Programming" in a tiny, 
cosmically powerful programming language). 


The process is to understand the self-definition, and then re-write it in some recursive 
language (could be something like Python, etc.) and this would produce the system 
like magic. (In practice there is a little more code needed around the kernel, but not a 
lot.) 


You should be able to find both of these references online (I found them for you 
(below)). Both of them are good science fair projects because they require some work 
and also a good explanation/poster by the student. 


They are also a gateway into the kinds of advanced thinking about computing that 
most people who learn programming never get to. 


Good Luck. 
John McCarthy et al Lisp 1.5 Manual (ca 1962) 


http://www.softwarepreservation.org/projects/LISP/book 
/LISP%201.5%20Programmers%20Manual.pdf @% 


Val Schorre Meta II Paper (1964) 


https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2& 
ved=OahUKEwij-JGiz4jTAhVB6GMKHeSfD4gQFggnMAE&url=http%3A%2F 
%2Fwww.ibm-1401.info%2FMeta-ll-schorre.pdf& 


usg=AFQjCNGjTqNju0YHmgWkGAz1ru8zreMIFw 


é Alan Kay 
<1 Invented and designed a few user interfaces ... - 6y 
Will graphical user interfaces ever be taken over by voice user interfaces? 


Will classical pipe organs ever be supplanted by singers (or vice versa)? Will people 
who like to make things with tools ever want to have servants do the best work? ... 


€ Alan Kay 

A Still trying to learn how to think better - Updated 6y 

What made Xerox PARC special? Who else today is like them? 

A good book (pretty much the only good book) to read about the research 
community that Parc was a part of is “The Dream Machine” by Mitchell Waldrop. There 
you will find out about the ARPA (before the "D”) IPTO (Information Processing 
Techniques Office) set up in 1962 by the visionary JCR Licklider, who created a 
research community of 15 or 16 “projects”, mostly at universities, but also a few at 
places like RAND Corp, Lincoln Labs, Mitre, BBN, SDC, etc. 


There was a vision: “The destiny of computers is to become interactive intellectual 
amplifiers for everyone in the world pervasively networked worldwide”. 


A few principles: 


1. Visions not goals 


2. Fund people not projects — the scientists find the problems not the funders. 
So, for many reasons, you have to have the best researchers. 


3. Problem Finding — not just Problem Solving 
4. Milestones not deadlines 


5. It's “baseball” not “golf” — batting .350 is very good in a high aspiration high 
risk area. Not getting a hit is not failure but the overhead for getting hits. (As 
in baseball, an “error” is failing to pull off something that is technically 
feasible.) 


6. It's about shaping “computer stuff” to human ends per the vision. Much of 
the time this required the researchers to design and build pretty much 
everything, including much of the hardware — including a variety of 
mainframes — and virtually all of the software needed (including OSs and 
programming languages, etc.). Many of the ARPA researchers were quite 
fluent in both HW and SW (though usually better at one than the other). This 
made for a pretty homogeneous computing culture and great synergy in 
most projects. 


N 


. The above goes against the commonsense idea that “computer people 
should not try to make their own tools (because of the infinite Turing Tarpit 
that results)”. The ARPA idea was a second order notion: “if you can make 
your own tools, HW and SW, then you must!”. The idea was that if you are 
going to take on big important and new problems then you just have to 
develop the chops to pull off all needed tools, partly because of what “new” 
really means, and partly because trying to do workarounds of vendor stuff 
that is in the wrong paradigm will kill the research thinking. 


foo} 


. An important part of the research results are researchers. This extends the 
“baseball” idea to human development. The grad schools, especially, 
generally admitted people who “seemed interesting” and judgements 
weren't made until a few years down the road. Many of the researchers who 
ultimately solved most of the many problems of personal computing and 
networking were created by the ARPA community. 


Parc was the last of these “ARPA Projects” to be created, and because of funding 
changes from the Vietnam war, got its funding from a corporation rather than from 
ARPA-IPTO. But pretty much all of the computer people at Parc had grown up in ARPA 
projects in the 60s, and Bob Taylor, who set up the computing research at Parc, had 
been the 3rd director of ARPA-IPTO. 


Bob's goal was to “Realize The ARPA Dream". 


Parc was highly concentrated with regard to wealth of talents, abilities, vision, 
confidence, and cooperation. There was no real management structure, so things were 
organized to allow researchers to “suggest” and “commit” and “decommit” in a more 
or less orderly fashion. 


Quite a lot of the inventions Parc is most known for were done in the first 5 years by a 
rather small pool of researchers (Butler Lampson estimates about 25 people, and that 
seems about right). 


One of the most interesting ideas at Parc was: “every invention has to be engineered 
for 100 users”. So if you do a programming language or a DTP word processor, etc, it 
has to be documented for and usable by 100 people. If you make a personal 


computer, you have to be able to make 100 of them. If an Ethernet, it has to connect 
to 100 devices, etc. 


There was no software religion. Everyone made the languages and OSs and apps, etc 
that they felt would advance their research. 


Hardware was trickier because of the time and costs needed for replication and doing 
and making new designs. In practice this worked out pretty easily most of the time — 
via not too many meetings — and the powers of HW geniuses like Chuck Thacker. A 
few things — like the disk sectors and simple Ethernet protocols, etc. — were agreed 
on, mainly to allow more important things to be done more idiosyncratically. In 
practice, Parc designed and put in the field a variety of Alto designs (about 2000 Altos 
were built), MAXCs, Dolphins, Dorados, NoteTakers, Dandelions, etc over a period of 
about 10 years — i.e. quite a lot. 


There were key figures. For example, Parc would not have succeeded without Bob 
Taylor, Butler Lampson, Chuck Thacker, and a few others. 


| would call the first 5 years “effectively idyllic’. And the second 5 years “very 
productive but gradually erosive” (the latter due to Xerox's many changes of 
management, and not being able to grapple with either the future, or a possible grand 
destiny for the company). 


€ Alan Kay 
~< Have designed a few programming languages - Updated 3y 


Are Smalltalk and Pharo out-dated? 


Sure. What's disappointing is that Smalltalk is still quite comparable to most 
programming languages in use today (and not always negatively). 


This means that the computing establishment has done a terrible job in coming up 
with something qualitatively better over more than 40 years. 


Added: October 27, 2020. 


In a conscious analogy to Lisp, Smalltalk is made from just a few ideas with as much of 
the language as possible as “library”. Both languages have enough of a reflective 
meta-structure to allow many different pathways from the kernel. 


So, a simple way to improve things using Smalltalk would be just to completely rewrite 
the library from scratch. Dan Ingalls used to do this every few years at Parc, and we did 
quite a bit — though not enough — when Squeak was done in the 90s. 


One of the human "cognitive biases” is “loss aversion” and this ties in with others such 
as “investment value” (the time put into something makes it worth more), etc. There is 
also plain laziness, etc. All these have made the history of Smalltalk after Parc 
completely different from how we used Smalltalk within Parc. 


But the 40 years since 1980, with the immense scalings and other happenings that 
have taken place — really demand a deeper set of designs, some of which affect the 
underlying semantics, especially for message passing. 


I've written and talked about some of these elsewhere, so won't iterate here. People 
who consider themselves to be computerists and who are interested in languages, 
should take a look at what languages need to be able to do. If they did, they could not 
just answer the question, but also start to provide some of the new problems to be 
solved and the start of solutions for them. 


Alan Kay 
Still trying to learn how to think better - 6y 


People who are really serious about software should make their own 
hardware. Why? 


The first part of the idea is that computing is about -processes- (all kinds) both 
understanding them and making them. 


At the next level of practicality, if one is making something that is supposed to be 
good for people to use — that actually might help them in important ways — then the 
design needs to be in terms of humans-with-processes, and shouldn't be limited by 
the particular hardware (and programming languages and systems) that vendors 
might be supplying. 


Finally, a new good and needed idea might not run fast enough or simply enough on 
existing hardware/software systems. 


In all of these cases, computer people should be able to deal with all the levels of 
organization needed to create the desired art. 


This is how personal computers, bit-mapped screens, the Ethernet, the Internet, 
smartphones, microcode, FPGAs, and (going way back) programmable computers got 


invented. 


€ Alan Kay 
<1 Still trying to learn how to think better - 6y 


How can | understand physics instead of just memorizing equations and 
facts?It really frustrates me how | only have to memorize equations and just 
work my way through school without understanding anything. 

Originally Answered: How can | understand physics instead of just memorizing equations and facts? 


There are lots of good approaches here. I'd start with thinking about -processes- 
rather than “relationships” or “words” — and combine this with some experiments (a 
great thing about the dawn of “real science” in the 17th through 19th century is that a 
lot of the experiments were rather simple, and done without a lot of equipment). 


In this example, you can also use some analogies to more intuitive human-scale 
processes, such as: “pressure” of “something” for voltage, flow of “something” for 
current, different kinds of “pipes” that can impede or help the “flow”, and so forth. 
These can be very useful if you keep in mind that analogies need to be suspected, 
even as you are using them. 


Even though hardware stores aren't what they used to be, you can derive Ohm's law 
for yourself with a cheap battery e.g. a 9v, some insulated wire, and a cheap meter 
(you can get them for around $10 that will measure voltage, current, and resistance). 
For example, Amazon.com: Alloet Heavy-Duty Digital Multimeter Handheld Meter 
Voltmeter Ammeter Ohmmeter Capacitors Farads: Industrial & Scientific 7 


Richard Feynman once said "Science means you don't have to trust the experts”, and 
this is true for many of the fundamentals. If your school isn’t doing science via getting 
students to -start- with the experiments and then be guided to derive the formulas, 
then this is not really a science class, but more like a religion class, and I'd complain 
like mad! (You could also look at a great old high school physics curriculum done in 
the late 50s and 60s by top physicists — PSSC Physics — to find a lot of good ways to 
get into science via actually being a scientist and doing science.) 


€ Alan Kay 

‘7 Had something to do with "Object-Oriented Programming" - 6y 

What does a good object oriented design for a football game look like? 
Originally Answered: What does a good object oriented design for football game look like? 


| left inheritance out of the first Smalltalk because | didn't like the way Simula’s single 
hierarchies missed important relationships and | didn't like the lack of expression of 
meaning (you could refine a superclass or create a new species, etc., and Simula didn't 
care). | didn't like the inheritance system that eventually got put into Smalltalk (too 
much like Simula’s). 


However, being able to make instances from a general description is hugely powerful, 
and if your language has some of the dynamics of Lisp (Smalltalk did) you can 
organize further relationships as understanding deepens. For example, a good 
question to ask about any language is “how easy is it to make real objects in this 
language?" — if it claims to be an OOL, then “how easy is it to do (say) ‘slot 
inheritance’ ?” (i.e. inheritance through the property names rather than through 
superclasses). And: are variables themselves objects in the language, etc. 


So, a good first pass for any system is to first identify “ideas” which will have multiple 
instances and take this as your list of first classes (or prototypes, etc.). 


| like to make a scratch system when I’m designing, even if the language has lots of 
stuff in it, because “real thinking is hard” and | want to do -real thinking- when | 
design. The scratch system will usually bring up head-scratchers that need to be 
resolved. 


At the next level, note that most interesting systems require something more like an 
ontology that a set of relationships and is only partly hierarchical — Minsky and 
Papert called system non-hierarchies “hetarchies”. 


Another thing to note is that objects in the real world have “roles” (a person can be a 
father, a son, a brother, a boss, an employee, etc.) This suggests more interesting ways 
of looking at objects. The “properties” of objects can include behaviors, so they are 

not just variables. 


And it's worth thinking about objects being partly made by “sideways compositions” 
as cooperting aggregates of simpler objects (this implies that objects are themselves 
namespaces. Many things that seem to require multiple inheritance can be more easily 
and cleanly described “sideways”. (This is kind of a parametric approach to thinking of 
complex entities — which is often simpler in important ways.) 


And that “objects need to be made from systems of objects that are made from 
systems of objects ... " (ie. instead of the inside being a list of variables and methods.) 
Note that in standard languages you can put objects into variables but these objects 
cannot see each other. This allows objects to be and to be used as real modules in a 
recursive fashion. 


But — back to the beginning here — don't forget to start with a dog simple scratch 
system of just instances of relatively simple classes, and think of the more 
consolidated descriptions as being partially optimizations possible for later. Then think 
about ontologies instead of hierarchies. Then be able to criticize the language you are 
in if it doesn't have enough of a meta-system to allow you build what you need as you 
understand more and more. (Don't try to be tricky: Tony Hoare likes to say that 
“Debugging is harder than programming, so don't use all your cleverness to write the 
program!") 


€ Alan Kay 
Have designed a few programming languages - 6y 


What are the benefits of learning to programme in Smalltalk? 


Take a look at In what way would learning Smalltalk make me a better programmer? 
for some opinions around a very similar question. 


€ Alan Kay 
-- 1 Have designed a few programming languages : 6y 
In what way would learning Smalltalk make me a better programmer? 


| think the crux of a question like this is to note the old saying that “You can do COBOL 
programming in any programming language!” In other words, the start is what you 
bring to a programming language. If it's an already formed theory and skills about 
how you program, then you'll recapitulate these in every language you try. 


Another way to think of the question is via metaphorical analogies to atoms, 
molecules, chemistry, biology, ecology, etc. Computers can carry out “systems of 
processes”, and what we want to ask is "What is a system?”, "What is a process?”, and 
“How can we make them?” Keeping those questions in mind helps us gauge how well 
our programming languages are helping us or getting in the way. “Biology and 
beyond” is where computing needs to be, so this is a good way to gauge where a 
program/system is. 


If we are lucky, our programming language will have “high expression” in the system 
we are trying to make (this is rare!). If we are unlucky, we will think the kind of 
expression our programming language has does fit our goal, even when it doesn't. 
This can lead to much of the awful code that we see everywhere today. If we have 
learned something about design, we will try to understand the best ways to 
characterize our goal, and we will both fit, and invent and make, structures in our 
language to model our goal. 


One way to criticize programming languages in general is to compare to “human 
languages" (like the English I'm using to write this) by noting that there is enough 
“stuff” in English to allow me to talk about many things without having to make up a 
new term, whereas most programming languages are more like pidgin languages with 
almost no vocabulary, and the adding of new vocabulary is a difficult task for everyone 
(what does the new term mean?) and every -thing- (is the new definition well drawn 
enough?). Etc. 


On the other hand, the flexibility of English requires considerable learned skills to 
make larger structures that are understandable. In the end, design and style are the 
keys to making something worthwhile. In the “old days” (the old fogey remembering) 
super high skills and a good macro-assembler allowed lots of good stuff to be done in 
machine codes. However, this was a kind of brinksmanship that didn't scale well (and 
it certainly got me to start trying to invent higher level languages that were much 
safer, more expressive, but still had real flexibility at all levels. 


All of this means the programmer not only has to design processes, but has to design 
languages. Both of these are quite difficult, in part because we don't even know how 
to do either very well. We can compare this to the great difficulties in user interface 
design, and note that a programming language -is- a user interface, and that a 
program -is- a user interface (and both are usually quite terrible in these terms!). 


At the next level, most languages offer “features”, often in the form of a library or 
surrounding system. These features can sometimes be useful (a system for doing 
computer graphics or user interface, etc,) but also have the problem of (a) being 
tempting by already being there, and (b) being intransigent to changes that are 
needed for the new goal. What is the cost of bypassing either or both of these? (Note 
that this extends to the so-called “operating system” (which should be seen as a rather 
bad idea, but most think of having one as “normal"). 


The way I'd answer your question is that the path towards better programming lies in 
becoming a better designer, and the ability of a programming language to create 
what is needed to realize your “better designs” is a good way to assess it. 


From this standpoint, Smalltalk is very much a creature of its time. A good way to get 
a sense of what this means is to take a look at a resuscitation of one of the Xerox Parc 
Smalltalks (from 1978). This was from a rescued disk pack that Xerox had put in the 
trash which happened to have an image of this system. Recently Dan Ingalls and Bert 
Freudenberg did most of the heavy lifting to get it going again, and | used it to make 
a presentation for Ted Nelson's 70th birthday festschrift. Here is that 15 minute or so 


presentation: Alan Kay's tribute to Ted Nelson at “Intertwingled" Fest 7% 


What you see here is a language/system/environment, etc made for a tiny computer 
but with “cosmic goals”. (It was this version of Smalltalk that Steve Jobs saw the next 
year in 1979 during the famous visit to Xerox Parc. You can compare this to what the 
Mac did and didn't do years later.) 


Almost 40 years later, we can note what we didn’t put in Smalltalk for a variety of 
reasons. For example, we only had cycles for limited multitasking (the original idea 
wanted something like what Erlang does do today). We had ideas about a real 
separation of optimizations from meanings, but didn't try to do this except for 
subclassing, which is not a good approach (would have been really nice to have taken 
a shot at this). We understood that -names- are too local for the kinds of scaling that 
were going to happen, and that we really needed “programming and finding via 
descriptions” — but we didn't do this. And so forth. What I'm driving at here is that — 
for any system — we have to try to understand it in the contexts that give us the best 
view of it, and that includes what was really good and also what was not done. 


Some of the good things about Smalltalk are noted by the other answerers of your 

question. But to circle back to the opening paragraphs, you can learn a lot about how 
to become a better programmer from existing programming languages if you've built 
enough of a design sense to be able to both criticize and also to see what is powerful. 


€ Alan Kay 

<7 Worked at Xerox PARC - 6y 

Did some Xerox PARC engineers understand that they had the technology 
that can change the world in the 70's? 

Sure! Lee Felsenstein is quite right that the Parc researchers — we weren't all strictly 
“engineers”, but a mixture of different types who could all do some mixture of 
engineering, science, mathematics, computing, cognitive psychology, etc. — aimed 
explicitly at changing the world for the better along the lines put forth by the real 
pioneers such as Licklider, McCarthy, Minsky, Sutherland, Engelbart, etc. 


As I've noted in other answers, Parc can only be thought about in the larger context of 
the ARPA IPTO research projects started in the early 60s by Licklider and subsequently 
guided by Ivan Sutherland, Bob Taylor, Larry Roberts, etc. “Fumbling The Future” has 
too many errors to be a good history, but “The Dream Machine” by Mitchell Waldrop 
is quite accurate enough to glean some of the breathtaking sweep of this research. 


Parc came about when the pressures on Congress by the Viet Nam war protests led to 
an overly sweeping change to how DoD funded research, especially on college 
campuses — the Mansfield Amendment — and this got Taylor to worry that the larger 
aims would not get realized. This happened to coincide with Jack Goldman, the Chief 
Scientist of the hugely successful Xerox corportation, urging Xerox to set up a long 
range research center. The link was a physicist, George Pake, who had been chancellor 
of Washington University (St Louis), where there was an ARPA project (so he knew 
Taylor). Pake became head of the newly set up Parc (in July 1970) and convinced Taylor 
in the Fall of ‘70 to set up a computer lab there (Taylor hired me as a consultant soon 
after this). 


Taylor cherry-picked relatively recent young PhDs from the ARPA projects to be the 
initial researchers (he wanted young researchers who had grown up on “the ARPA 
dream”). 


In the first few weeks of 1971, a critical mass was achieved when Taylor convinced a 
core group of researchers — including Butler Lampson, Chuck Thacker, Dick Shoup, 
Charles Simonyi, Jim Mitchell, Ed Fiala, Willie Sue Hoageland, et al to move en masse 
to Parc. A second wave of similarly talented researchers started to drift over from the 
Engelbart project in nearby Menlo Park. 


This group was so stellar that | decided not to go to CMU to work on the Dynabook 
but to become a full time researcher and set up a group with the rest of these 
wonderful characters. We were most definitely all engaged to change the world for 
the better. 


é Alan Kay 
1 Still trying to learn how to think better - 6y 


Why does my school aged kid ask inane questions like "what is gravity 
made of" or what is a "field made of" rather than just learn the material? 
How do! discourage this so they can get good marks? 


If you assert that you are not kidding here, I'll be happy to answer your question. 


€ Alan Kay 
<1 Worked at Xerox PARC : 6y 


What was working at Xerox PARC in the early 1970s like? 


Best time ever! A truly wonderful combination of great people with great aspirations, 
abilities, and determination. And great (and needed) protection for the critical years by 
the “impressario” (as he liked to call himself) Bob Taylor. 


€é: Alan Kay 
1 Still trying to learn how to think better - 6y 


What is Xerox PARC’s contribution to the world? 
Originally Answered: What is Xerox PARC contribution to the world? 


A good book to read — which gives the larger story of the ARPA Information 
Processing Techniques Office (IPTO) research projects starting in 1962 and how Parc 
was part of this community and budded from it — is "The Dream Machine” by Mitchell 
Waldrop. 


| think the bottom line is that the particular concentration of interests and abilities of 
the Parc researchers constituted a critical mass to create one of the possible 
realizations of the “ARPA Dream" of personal computing and worldwide networking 
by using many of the previous inventions of world-wide computing research over the 
previous 10 years, and adding a few inventions on the way. 


Parc's output included a workable personal computer powerful enough to deal with all 
media, a display that could handle arbitrary images, a general user interface that was 
reasonably learnable by all ages and types of people which could mediate hundreds of 
thousands of applications, WYSIWYG interactions, “desktop media” authoring and use, 
a super-reliable-easily-deployable local area network (Ethernet), the first laser printer, 
the first dynamic completely object-oriented languages, development and operating 
systems, and hefty contributions to the Internet (Parc had its own “internet” before the 
official one). There were numerous “other things” as well, but this should be sufficient 
to answer your question. 


Alan Kay 
Still trying to learn how to think better - 6y 


If | am auditioning for a Shakespeare play, should | do an English accent for 
an audition or should | just use my normal voice? 

Originally Answered: If | am auditioning for a shakespeare play should | do an English accent for an 
audition or should | just use my normal voice? 

| agree with the other answers that you should concentrate on pulling off a speech 
before all else. Clarity and rhythm are the watchwords. Getting it over the footlights 
with “quiet conviction” and not too much movement — but with “body attitudes” — is 
the key to stage acting. 


€ -) Alan Kay 
2 Lam the Alan Kay in question. - 6y 


Is Alan Kay's statement “science is a set of heuristics to get around buggy 
brains” a reference to Nietzsche's knowledge-as-preservation? 


No — but an interesting question. This statement is my translation into modern terms 
of some of Francis Bacon's observations and calls to action in his “Novum Organum 
Scientia” (ca 1620), where among many other things he identifies 4 “idols that humans 
worship", of: the Tribe [human genetics and culture], the Cave [the internal thoughts of 
an individual], the Marketplace [inability of our languages to represent well enough], 
the Theater [basically: bad ideas promulgated by schools and academia]. 


He called for new methods — a “new instrumentality” (“Novum Organum’) — to 
combat these problems that would lead to new and more clear “knowledge” 
(“Scientia”). In modern terms we can still use “methods” and we can make the 
meaning more clear by substituting “heuristics”. 


Looking at “science” from this point of view, we should be able to readily see that it 
should be made the basis of any decent system of education in our time. 


“The Gay Science” is one of Nietzsche's books that | haven't read (but now will). From 
the wikipedia article link you gave, some of his ideas are very similar to the earlier 
ones from Bacon. For many such thinkers, it was not until the 20th century that 
science had gotten to the place where the gap between “what's out there?” 
generating phenomena, and our abilities to understand and represent the causes of 
the phenomena started to look as though it might not be completely bridgeable (a 
doubly urgent need for heuristics to get around our “bad brains"!) 


€ Alan Kay 
<1 Invented and designed a few user interfaces ... - 7y 


I really dislike (to put it mildly) the new Quora UI/UX. Both web and mobile. 
Am | the only one? 


Why introduce modes for expanding answers — by making a popup that forces 
closing to get back — when expanding them in the scrolling galley was the best 
solution, and that is what you had. This is “reinventing the flat tire”! 


Please change it back. 


€ Alan Kay 
I'm the "Alan Kay" in question (try Google for the usual misinformation) « 7y 


What is it like to work with Adele Goldberg? 

| first met Adele when | was putting together my research group at Xerox Parc in its 
earliest days, and she was a computer scientist (U of Chicago) finishing her PhD 
working with Pat Suppes at Stanford on an interactive system for helping young 
children learn mathematical logic. She was deeply interested in, and very able at every 
aspect of “computing, children, and curricula”. 


| managed to convince her that the ARPA/Parc dream of “personal computing” 
especially “for children of all ages” for “everyone in the world pervasively networked 
together" was the most important thing we should all work to make happen. 


Our research group achieved critical mass — and then some! — when Adele moved 
over from Stanford to join it. 


Adele is not so easy to describe because she combines a level of intelligence, abilities, 
energy, verve, and vivaciousness that almost no one else has. Anyone who has met her 
will know what | mean. She is a force! A person who makes things happen. 


4 


When she was an undergrad, she told a friend she was going to get a PhD in 
Computing, and when her friend said “But you'll have to work twice as hard as a man", 
Adele replied “Is that all?” 


It is also not so easy to describe the dynamics at Parc, especially in our Learning 
Research Group. | would describe one aspect of it as “a bunch of leaders and 
individual contributors who somehow listened to each other and sought the opinions 
of others”. Part of the success had to do with the shared visions — at the highest 
levels we were all trying to make the same thing happen — and conversely, that there 
was not a party line as to goals and projects — at the next levels each group and most 
members of most groups had very individual takes on “most desirable projects”. This 
resulted in a very wide research coverage of the personal computing and networking 
landscape. 


Adele started to set up our experiments with children, quickly became a major 
contributor in every aspect of what we were doing, and — to the extent that there was 
any organization in our group at all — she joined Dan Ingalls and myself as the main 
“fountains”. 


I'm failing at answering the question of “What was it like to work with her?” A weak 
comment is that it was just really great, and we all got a lot done together. One of my 
favorite memories is from the time that the top 10 executives of Xerox had been 
invited out to Parc (ca 1977) to learn about “personal computing”, and our group got 
the task of teaching them about “modern programming’ (and all in 3 hours over 2 
days). Smalltalk-76 had just gotten working, and we decided to combine some of our 
experience with simulations (going all the way back to Simula and several versions in 
Smalltalk), to see if we could make an end-user programming experience in the 
domain of simulation that would allow the execs to learn and make simulations that 
were relevant to them. Adele took the lead on this project and steered the group to 
make the Smalltalk "Sim-Kit", a graphically acted out “job-shop" simulator. The execs 
could make icons for the “things being worked on" and program within the framework 
to model parameters they were familiar with. The result would be a dynamic portrayal 
of the simulation. For example: one of the execs ran a printed circuit factory at VariTek 
and was able to simulate the main line: 


My never to be forgotten memory was that Adele had her second child during the 
prep for this, and rigged up a harness so she could nurse the baby while writing 
Smalltalk code on her Alto! I'd give anything for a photo of this, but the memory is 
indelible! 


And we are all still great friends — and occasional colleagues — Adele is on the 
advisory board of Viewpoints Research Institute, and we've recently been discussing 
fruitful directions for the new Y Combinator Research initiatives in “Human 
Advancement" that have recently been set up (thanks mainly to Sam Altman). 


As a tribute to Adele, let me just say that in my opinion she cannot be praised too 
highly, not just for her many seminal technical contributions to both computing and 
education, and not just for her abilities to make things happen on her own and 
especially with others, but also for her lifetime of service to her field, which has 
included being President of the ACM, and a force on many critical committees over 
the years. 


Yay Adele! 


€ Alan Kay 
Still trying to learn how to think better - 7y 


How does one develop a really strong intuition for mathematics and 
physics? How can | improve my ability to solve "really challenging" 
physics/math problems? 

| enjoyed reading all of the other answers. One thing to ponder is that your two 
questions are not necessarily compatible. 


And let's use a less lofty label than “intuition” — how about “guessing”? 


(See Kahneman’s “Thinking: Fast and Slow" for his “expository fictions” of fast non- 
cognitive difficult to learn pattern matching mechanisms and more cognitive slow 
faster learning mechanisms.) 


For the purposes here, let's say that most of what is called “intuition” is in the fast 
non-cognitive realm (it's reacting to things that are similar to what have been seen 
before). This is the “practice” part of the advice. The practice is essentially changing 
and shaping the parts of the brain that do quick “commonsense reasoning” to add 
some of the “un-commonsense” that underlies math and the physical sciences. But it’s 
still mostly non-cognitive — so it pays to be careful here. 


Hadamard's book “The Psychology of Invention in the Mathematical Field” is quite 
revealing. He was a leading mathematician, and towards the end of his life he 
circulated a survey to his friends — his private list of the 100 greatest mathematicians 
and scientists of his day — to find out how they thought they did their thing. Most 
said they didn’t use symbols, but did use visualizations of various kinds, and about 
20% (IIRC) including Einstein, said they also had "muscular kinds of sensations”. 


Certainly, getting well versed in “visualizing” (not necessarily in “images”, but in terms 
of “configurations”) will help, especially for things that are well known already. 


But ... 


Since “guessing” can go almost anywhere (this is the problem with “intuition” that 
Prof. Brewer points out), it's worth getting skilled at heuristics that can help deal with 
both sides of your question — meaning: we'd like to (a) be able to “guess” pathways 
to take for “challenging math/physics problems", but (b) we'd like to avoid the trap 
that Prof Brewer mentions — to get wedged into the commonplace — and escape so 
that we can also be alert to new possibilities. 


A really simple exercise that can help a lot is to get out your trusty moleskine 
notebook and on the left page write down what occurs when you look at a “problem” 
or “situation” of some kind. This will both capture what your pattern matchers are 
tantalized with, and it also performs a kind of release that will let you write down on 
the left page stuff that is as different from the right page as possible. 


| write down ideas also — mostly to get away from them: “Most ideas are mediocre 
down to bad!" To really think, you have to be able to escape from the delight of 
having an idea. (School is a killer here because so much of it is about answers rather 
than questions — many people are wiped out because they think they need to come 
up with an answer ...) 


Just have lots of ideas, and then put them somewhere that will not interfere with 
further ideas and directions. 


| like the idea that what's between my ears is not reality but just a construct of one 
kind or another, so playing with constructs — fictions — can help things quite a lot. 


€A Alan Kay 
*— 1 Invented and designed a few user interfaces ... - 7y 


As voice interfaces get better, how will we draw the line between where we 
use visual Uls vs. voice? 


Jared Zimmerman’s answer is good. One thing he didn't mention — in general, with 
regard to semi-ls as interface assistants — is trust — and how it can be dealt with 
continuously. 


For example, in a reactive GUI, we learn to trust most cause and effect relationships via 
having more or less what we hoped for manifest itself, not having our state go away, 
and by having a very reliable UNDO operation. 


For agent-based interfaces we have even less contact between our wishes and the 
means and possible side-effects of giving tasks to agents. The visual often helps when 
it is manifested as part of a task (driving directions, simple retrievals, transcriptions, 
etc.). 


In general, it's worth pondering the difficulties of communicating and working with a 
human assistant who has much more intelligence and both commonsense and special 
knowledge and context — and can explain reasoning better — than any Al. 


€ Alan Kay 


Have designed a few programming languages - 7y 


IT scientists often say about other scientists that they don't know how to 
structure software code. Are there ways to learn it without studying CS? 
Originally Answered: IT scientists often say about other scientists that they don't know how to 
structure a software code. Are there ways to learn it without studying CS? 

Most code is not “well structured”, even by experienced programmers (even by 
experienced “structurers”). The early stages of programming are often more 
experimental than one would hope, and experienced programmers will sometimes 
take the trouble to try to re-organize the initial ad hoc structurings. 


The languages and development systems in general use are pretty terrible at helping 
reorgs. And most systems and languages wind up with dependencies that gradually 
become hidden to the point that reorging is both frightening and prohibitively 
expensive. 


Let's put "Humans First!” here, and mostly blame the current low state and low levels 
of programming and programming languages for a lot of the problems at the 
different levels of scale that span from non-pro-programmers to pro-programmers. 


A lot of the original impulses behind the development of "higher-level languages” — 
often called “problem oriented languages” (POLs) in the 60s — was the desire to 
“program in problem terms” and not have to worry about every detail of method, 
especially with regard to optimization. 


The very early FORTRAN had a foot in both worlds. It was hugely optimized to run well 
on late 50s and early 60s computers, but it almost only had one trick (represented by 
its name of FORmula TRANslation), it was almost assembly code in other respects.This 
led to spaghetti code for almost all programmers. 


A very early (1962) beautiful example for simple problems of interest to scientists was 
the first great interactive language JOSS. A slightly later, much more comprehensive 
POL was APL (a working system appeared in the late 60s). 


Two interesting systems today that can be useful for scientists are the venerable but 
capable Mathematica (a tool for thinking and modeling), and the relative newcomer, 
Julia, which attempts to be an efficient dynamic language that uses more or less 
standard expression syntax. 


Putting "Humans First!" again, we could ask what do scientists actually need? for (a) 
what they think they need, and (b) what they would conceive to need if the language 
had the facilities. 


There is certainly a diversity of perceived needs, but | think most scientists would love 
to have something they could use to (1) represent theories, that (2) can automatically 
yield simulations that can probe the theory, (3) suggest more things to check for “out 
there", and (4) that allows useful “tinkering”, but (5) somehow can help convert the 
tinkerings into (a) really fast executions that are (b) still in forms that help thinking 
about the processes. A side note is that lots of programming everywhere is done by 
making use of already existing packages via what are essentially subroutine calls -- the 
resulting programs might work, but the forms wind up being intractable to help 
thinking and further explorations. 


However, a little pondering might get us to realize that most IT pros need something 
like the 5 points above *and* coupled with an executable requirements language (we 
can see that the scientists also want to program in terms of “requirements” and that 


most of a solution for one will help the other, even though some of the needs are very 
different. 


“CS" still has a long way to go here! 


Alan Kay 
Have designed a few programming languages - Updated 7y 


What are the difficulties of creating "smart APIs", where two programs can 
automatically figure out a way to talk to each other? 


Great question! 


It's worth looking at some similar observations over the years (apologies in advance 
for the length of this reply). 


JCR Licklider -- the great originator (1962) and funder of ARPA-IPTO -- in 1963 called 
for an “Intergalactic Network", and in one of the earliest memos -- to "the members of 
the Intergalactic Network" -- observed that the greatest problem that would be faced 
if we had one, would be “communicating with aliens". He meant not just other people 
across the globe, but with other programs, and that programs would have trouble 
communicating with “alien programs" and with “people”. 


(When asked "Why ‘Intergalactic'?" he said, "Engineers always give you the minimum, 
and | want a world-wide network, so | asked for an ‘intergalactic’ one"!) 


As is often the case, this led to a number of tracks. Most of computing ignored Lick's 
insights and requests. A determined group within ARPA-IPTO (that included Parc after 
it was formed) sought to make the "communication of bits" part scale to the planet, 
even with heterogeneous hardware. This resulted in the ARPAnet, and then the 
Internet. 


Also within this community were ideas about “software computers connected by 
neutral messaging" that could be a universal scalable way to unify -- and make mobile 
-- software systems, even with heterogeneous software systems (using similar 
encapsulation ideas and separating protocols from methods) that could freely be 
mapped onto the interconnected computers that the Internet would make possible. 
For example, Smalltalk at Xerox Parc in the 70s experimented with "proxy objects" 
which were gateways to objects in other systems and computers. 


The really good idea of “the Internet as a transparent cache for “intercommunicating 
object-processes" never gained wide popularity (but see Reed's thesis below). 


The “discoverable communications" implications of Lick's “communicating with aliens" 
metaphors were less thought about and worked on. But there were a few interesting 
ideas. 


In the 60s, Hans Freudenthal -- with Lincos -- independently took a stab at 
bootstrapping a communications system from scratch with aliens. 


In the 70s, the Smalltalk group at Xerox Parc started to come up with ways to “find 
functionality" without having to know the local labels for it. For example, how can you 
find the "sine" function if you don't know its name? Or find a “sorting” facility if it isn't 
called "sort"? -- for example, in French it might be called “tri" or "trier" or "ranger" -- 
names don't scale well! 


Also at Xerox Parc during the same period, it was realized that it was easier to send a 
program to a server to be “executed in confinement" than to try to get all servers to 
understand all data formats (this is where Postscript emerged from a graphics making 
language to a "traveling communications protocol"). The insight is that a rather simple 
executable language can be made universal, but that there is no end to possible 
formats of “data”. This fits with the overall reasons for "send objects, not data". 


Doug Lenat in the 70s at Stanford showed with AM (a program that could discover 
mathematical theorems) and the later Eurisko (a more general “discovery system") 
how evidence could be gathered, tests made, matching done, etc. without relying on 
local labels. 


Why AM and Eurisko appear to work @ 


Of great note is the 1978 MIT PhD thesis by Dave Reed -- who is also known as “the 
‘slash’ in TCP/IP" -- which sketches an architecture for an Internet-wide operating 
system of coordinated objects running in a common pseudo-time. 


David Reed Thesis (187 pages pdf) 


(These ideas were later validated in the Croquet System ca 2003 by Smith, Raab, Reed, 
and Kay) 


https://www.researchgate.net/publication/4029258_Croquet_- 
_A collaboration_system_architecture @ 


The first great practical demonstration of “dynamic mobile processes" across 
heterogeneous hardware was done in "The LOCUS Operating System" by Gerry Popek 
and his group at UCLA during the early 80s. This wasn't extended beyond underlying 


Unix processes, but could have been. The first several chapters of the MIT Press book 
are excellent in laying out the general issues for machine independent processes. 


The LOCUS Distributed System Architecture 7 


David Gelernter in the early 80s called for “coordination meta-languages" from which 
asynchronous communications brokers could be made. Linda was an early provocative 
result. 


Linda (coordination language) - Wikipedia, the free encyclopedia 7 


There were also some too-narrow gestures at this problem -- for example, CORBA 
(and there are a number of such too-limited schemes today). 


Common Object Request Broker Architecture - Wikipedia, the free encyclopedia 7 


A hugely interesting subgenre of gaming instigated by Mike Genesereth at Stanford 
also has many clues and provocative possibilities -- the "game" requires the human 
players to write a software system that at a given point in the future can read from the 
Internet a just supplied description of a game, synthesize game playing agents from 
these descriptions, which can then play the game in an environment maintained by 
Genesereth. If we can see this as "creating automatic programmers who can solve 
specifications", then it adds a lot to our thinking here. 


Gamemaster 


There are also some “unfortunate” stabs at universal interfaces that pretty much 
missed the point of scaling, the Internet, and heterogeneity of software -- for example: 
CORBA. 


A good way to think this through is to start with the benign cases of "friendly aliens" 
and try to see how this might work. 


We could imagine an extension to TCP/IP in the form of a simple but universal and 
mobile language that could serve as a semantic basis for “matching of meanings" to 
see if agreements could be found strong enough to allow interchanges. 


Given the psychology of most programmers -- who almost never will do something 
just because it is a good idea -- the lingua franca will have to pay for itself in local 
ways to have a chance of adoption. 


However, we will also need a very comprehensive “expert-and-discovery system" that 
also has real world knowledge -- such as CYC: a vast combination of a system like 
Eurisko with extensive real-world knowledge and commonsense relationships -- that 
can deal with the implications of the expressed semantics. 


http://www.cyc.com/ @ 


The expert/discovery system can be large, whereas it is likely that the universal 
description language will be better if much smaller, if only to encourage learnability. 
(It's an interesting question as to size and features: most programmers are terrible at 
thinking in terms of and representing semantics -- unit tests are as close as they 
come.) 


This is a large important subject which I've already taken too long for Quora -- but it's 
worth looking at an interesting, and perhaps bootstrapping, use of these ideas, even 
before they are perfected for the larger scaling that is the main aim. If we take a look 
at modules “as they should be" we can readily see that they should be self-contained 
encapsulated entities for many reasons, yet most will need to make use of external 
resources themselves. On the other hand, we'd like to debug and vet them to verify 
that they run as desired, and also that any use of external resources is in accord with 
their actual needs and specs. (We should be able to see that ordinary type systems are 
not nearly semantic enough to fill the needs here.) 


Instead, we could imagine that each module has a “stub environment" whose 
members can be a stand-ins for the eventual ones, are able to do enough to allow the 
module to be debugged, and can also act as semantic guards and matchers to verify 
that the external resources are as needed. These stubs would look a lot like what 
Lenat's "AM" system would gather up as it is "finding" “interesting” relationships. 

It is very interesting to try designs for these stubs! One of the simplest needs in 
systems made from any kind of modules -- even just functions -- is "referential 
transparency" -- and normal typing doesn't do it -- i.e. we want to be able to replace 
one sine function with another sine function, not just one function outputting floating 
point numbers with another. 


Finally, for now at least, let me pull back from trying to communicate with “aliens out 
there" to ponder these problems "right here" -- because most of them can be found 
in corporate software done by the "same" IT organization. 


Some years ago, a well known corporate systems supplier asked me to look at their ~ 
350 million lines of code (of course, since a 400 page book only has 20,000 lines, | 
hardly looked at any code at all). But | was able to identify more than 30 quite 
separate ways of "sending messages" that had developed over the years, and none of 


the current schemes were coping with the scales that things had grown to -- so both 
integration and real-time late-binding responses to needs weren't happening. Reverse 
engineering was extremely difficult and expensive because they really couldn't say 
what most of the specs and requirements were and what the code actually did in any 
detail -- it just "did what it did". 


And, anecdotally, how many readers remember looking at a routine they wrote several 
years earlier and wondered “what did | mean by that label?", “how did | actually write 
this code so it worked?", etc.? 


*We?* are aliens! 


I'll end by just asking the question “How can we ‘ground’ the meaning of software we 
write in other than our own memories and memoirs?" Let's see what Quora comes up 
with .... 


€ Alan Kay 
‘Have designed a few programming languages - 7y 


How do! learn object-oriented programming? 


Like “Computer Science” — which had a very different and much better meaning when 
first coined in the 60s (in part, it represented real aspirations towards finding the 
strongest notions of itself) — “Object-Oriented Programming”, when | coined the term 
about 50 years ago, also had a different set of meanings and aspirations. 


Today in computing, we find ourselves in situations magnified many orders of 
magnitude by Moore's Law and the success of the inventions of Personal Computing 
and the Internet. | think it's worth trying to think things through carefully rather than 
(a) trying to deal with the current conceptions of "OOP", and/or (b) going back rigidly 
to much of what was so powerful in the computing milieu 45 years ago. 


In rethinking things, we find some old friends in ideas — such as protected modules 
that are “whole computers”, non-command messaging, requirements and constraints, 
transactions, “before-and-after”, meta-levels, separating meanings from methods, 
“objects” as “servers”, and so forth. And many of the old dangers: race conditions, 
indeterminacy, scaling, reformulation, hopeless (and needless) complexities, and many 
more. 


We are still faced with the large problems of design at too many levels, because it is 
rare that each concern and requirement can be satisfied in complete isolation, and the 
number and kinds of degrees of freedom that seem to be needed preclude much of 
classical mathematical treatment in favor of building and debugging. 


There are a number of truly important ideas — many from the “deep past” in our field 
— that need to be comprehensively understood and pondered — both for intrinsic 
beauty, and to ask what they mean for today. 


For example, the first completely startling-system-with-objects that knocked me on 
the head 50 years ago was lvan Sutherland's Sketchpad, already 4 years old. The 
“entities” in Sketchpad were mostly graphical — they showed up on the display as 
“things” made of “lines” that were made of “end-points” that had "x-values” and "y- 
values”, but they were not “data structures” (they were “behavioral” and the Sketchpad 
programmer could not do anything like an “assignment statement”). 


Instead, Sketchpad was “programmed” by a combination of hand constructed 
“objects” whose behaviors were “impressed” on the objects in terms of “constraints” 
(which were the dynamic requirements for each object). The Sketchpad system itself 
dynamically “solved” the intertwined requirements — and this let the “programmer” 
think in as linear terms as possible to allow most brain-cells to be used for the difficult 
problems of design and purpose. 


The field back then — and this author — were not up to really carry this model 
forward — instead, we found ways to approximate some of the ideas, but at real cost 
to the integrity of aim that Sketchpad brought. There were a few important exceptions 
over the years. 


But today, it is possible to really address these important ideas about “designing and 
programming in requirements” with complete separations of “tuning” and 
“optimizing”. 


A lot of the best systems in the future will be alot more like Sketchpad in approach 
than most systems today. We need to work to make this happen! 


There are a number of other really important ideas from the early 60s that have 
missed becoming part of our basic tools and thoughts today (partly from the faddism 
that has always been rife in computing, partly because our not-quite-a-field cares no 
more about history than any manifestation of pop-culture, etc.). 


A good example is how John McCarthy in the early 60s was able to both advance 
states in time but without race conditions or violating “logical and functional 
relationships”. He called the mechanism “fluents”. Today, one of the terms used for 
this is “computing in pseudotime”. The idea should be familiar: instead of destructively 


changing things, retain a history of the changes going forward, each new event 
representing an increasing point in pseudotime, which becomes an obligatory 
parameter on every object — the aim is consistency of relationship for each 
pseudotime. With a few more niceties we wind up with a universal use of “atomic 


won, 


transactions”, “versions”, etc. 


It should be clear that there is no conflict at all between the idea of protected 
modules, non-command messages, and “functional relationships". 


There are many more important parts to think about and rethink — but this is already 
too detailed. 


A good heuristic for my own thinking about our new not-quite-a-field is to not just 
“think systems" (avoiding lower level mechanisms), but to “think Biology”. The latter is 
tricky because not all the systems principles that can be and are used by Biology are 
within the current scales of computing. But, if you think about “cells as objects” then 
many important principles quickly come to mind. (And if we look around for the 
system that is most like this today, we find Erlang and its derivatives ...) 


My conclusion here is that in in the early stages of any field, it is not a good idea to 
get rigid and dogmatic, even religious, about “principles that are not strong enough 
to be principles”. 


The Turing Award winner Tony Hoare had a great observation on us in general: 
“Debugging is harder than programming, so don't use all your cleverness in writing 
the program” (this goes for design too!) 


€A Alan Kay 
‘I'm the "Alan Kay" in question (try Google for the usual misinformation) - 7y 


What are 5 most interesting and well-paid jobs in CS? 


“Real Science" like any great art form is a “calling” not a “job”. 


| realize that the use of “Science” in “Computer Science” has now lost its original 
meaning and intent, and is now more like the usage in “Library Science”. However, as 
an old fogey who learned and loved the original meaning, | will stand pat on it. 


In a “calling” the questions are first and foremost how to grew to be effective at your 
art. Beyond this, most who are called are greatly aided by good funders, and usually 
impeded by poor ones. Much of the effectiveness of the ARPA-IPTO community in the 
60s and its last add-on of Xerox PARC in the 70s was due to how great funders were 
able to support “people of possibilities” to realize their potential in the wonderful 
vision that was the foundation for this research. 


There were people who were already “called” to the the Romance of Computing 
before the ARPA funding was started in the early 60s. These formed the early 
community, and they attracted others, many of them younger with even fresher ideas. 


Now, what was your question again? 


€ Alan Kay 
k 7 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 7y 


How can | learn to create computer-generated music? 


Which part of your question is more important: “music” or “computer-generated"? 


“The music is not in the piano"! So if “music” is the more important part — | think it 
should be — then developing the internal musical tendencies given to us by our 
genetics is the most important thing: a lot more singing and dancing as a start! 


“The music is not the notes”! whether on the page or played by anything ... similarly: 
“The speech is not the phonemes"! 


Music is a way to express both emotions and ideas, especially those that are both 
important and don't fit well into speech. 


Both speech and music lend themselves to technical analysis and skill at every level — 
But “Technique should be the Servant of Art, not the Master"! 


tn 


To developed musicians most “computer-generated ‘music’ " doesn't sound remotely 
musical from many perspectives: phrasing and prosody, rhythmic, taste, etc. CGM is a 
kind of “designer jeans” (i.e. expensive pretentious dungarees). It's a caricature of 
music. 


However, in the hands of developed sensitive musical sensibilities, the computer 
becomes a wonderful new way to create musical possibilities and expression. 


In our age of “fantasy sports” and other “fantasy pursuits” there are many kinds of 
“fantasy music”. It's worth putting in the effort to start getting into the real deals. 


G Alan Kay 
<1 Still trying to learn how to think better - 7y 


What is BIOS? What are its characteristics and functions? 


With all due respect, this is not a “theoretical computer science question”, or even a 
“computer science” question (“real computer science” (RCS) is distinct in many ways in 
what it is trying to do from practical engineering). | use RCS here because this 
distinction has been almost lost for most computerists. 


A good enough answer to this question can be found in the Wikipedia article on BIOS 
@ (as with many answers to many questions asked on Quora). 


€ Alan Kay 
Have designed a few programming languages - 7y 


How does one mathematize code and write bit-identical code for all 
platforms? 


| don't think | have used the term “mathematized code”. | have suggested (and my 
research community has used) the tactic of coming up with “mathematics” for 
important ideas, and then trying to turn the math into a runnable computer language 
(also a kind of math but often less “relational"). And to try to separate the -meanings- 
from the optimizations, etc. For example, that was one of the directions taken in the 
early 60s when there was a desire for a meta-language for grammars that was formal 
enough to be reasoned about, expressive enough to allow grammars to be readily 
understandable and communicated, and also to be “runnable” to make parsers. The 
Alex Warth “Meta” example in the talk (from 2006 or 2007) is a particularly good one: 
the grammars are clear, and so are the translation relations. 


The “bit-identical” computation ideas came from a desire in the 60s to ensure 
semantics over many platforms (and this is oddly still not a top priority in the 
computing world). In the ARPA-Parc community — which was working on personal 
computing pervasively networked over the world — thus many hardware platforms 
and likely many OSs — this was a critical need. 


The byte code virtual machine idea traces back to an actual piece of hardware — the 
Burroughs B5000, mostly designed and invented by Bob Barton — which had the aim 
of being able to directly manifest a reasonable hardware platform for “higher-level 
languages" (such as Algol). The result was quite wonderful, and at many more levels 
than just byte-codes (all computer people should know about and understand this 
amazing design from the early 60s). 


Various people realized that the B5000 architecture would make a great and highly 
portable virtual machine — a notable example was Wirth’s and Weber's Euler 
language (CACM Jan-Feb 1966), whose byte-codes were later executed via microcode 
by Weber. These were called P-codes and were used in a variety of systems, including 
Pascal. 


Once networking is added, with the possibility of a particular system running 
simultaneously on different hardware and even migrating computations, the idea of 
“bit-identical” arose. This ups the ante from “very similar computation” to the need for 
“exact computation” (and we have to figure out what we mean by this). 


There is now more work to be done, and the tendency of computer people to borrow 
code to save work gets in the way. Just to mention an extreme case, when the Croquet 
system was being built about 15 years ago, the “bit-identical” Squeak Smalltalk was 
used as the foundational environment. Croquet is based on Dave Reed's 1978 MIT 
PhD thesis about replicating computations in pseudo-time over the whole Internet, 
and so the logically identical objects on each machine have to always come up with 
the same results going forward in pseudo-time. Eventually it was discovered that, 
while Squeak Smalltalk did indeed ensure that floating point arithmetic, etc., was the 
same regardless of platform, that it did not ensure that transcendental functions were 
bit-identical. (Things worked better after this was fixed!) 


Note that “byte-codes” are not the issue here, but “preserving meanings” is. So, if a JIT 
translator is used, however it is done, it still needs to “preserve semantics” — everyone 
agrees on this, just to different extents on what “preserving meanings” actually means! 


It's worth thinking philosophically about these issues — and to also ponder the 
pragmatic implications of (a) world-wide computing (b) dynamically migrating 
computing (see Gerry Popeks’ LOCUS system from the 80s), and (c) perfectly 
replicatable computing. Note, that as a system is scaled, it becomes more and more 
worthwhile to architect ways to get desired results in the face of more ways to make 
errors. It's likely that heuristics that will converge computations to obey constraints 
will be more useful in the long run than to try to make “perfect clockwork". This is an 
“Engineering + Shannon” solution to something that can be difficult when trying for 
exactitude. 


€ Alan Kay 


Invented and designed a few user interfaces ... - 7y 


How are programs with a user interface made? 


There is “what?” and there is “how?”. We can guess that “what?"is difficult in numerous 
ways, because there are many poor UI designs to be seen these days, even with 


components supplied by OSs and libraries. Quite a bit of good UI design requires 
deep understanding of how we humans “work in the world", and it is certainly the case 
that most computerists today have missed this learning. 


There are many more roads that could be taken besides the combination of ideas in 
the Parc GUI (overlapping windows(views), icons, pointing, modeless, 2+D, models- 
views-controllers, etc.). Here is an essay | was asked to write in 1989 about this 
approach http://www.vpri.org/pdf/hc_user_interface.pdf @ 


The “how?” part is relatively easy if you don't have to optimize: e.g. if you can refresh 
the "virtual display screen” after each graphics change, if you can afford to march 
through all of the boundaries of the regions that are supposed to be sensitive to the 
pointing action, and so forth. 


It will really be a worthwhile exercise for you to do a simple from scratch system that 
can be manipulated by an end-user. 


You will have arrays of 32 bit numbers that will represent pictures with a pixel being 3 
12-bit fields — for red, green, blue — of each number. Write a routine that will copy 

one of these — it will have associated with it the x and y and width etc of the picture 

— into a very large array that will be your simulated display (you will hand this off to 

the OS for display after each major action). 


Associate a "depth number" with each picture and put pointers to the pictures into an 
array that will hold the pointers in deepest order first. Write a routine to clear the 
display array and copy each picture into it. This will manifest as the familiar “2 1/2 D" 
graphics most often used. 


To deal with pointing actions, write a routine that marches through the picture 
rectangles (you have the x, y, width and height as parameters for each picture) to see 
which rectangles the pointer point is in (note that you generally want to give 
precedence to the topmost one). 


You might try to set up two UI actions: 


-clicking-, which will bring a picture to the top (meaning that you've changed its depth 
to “highest” which means that the redisplay will display it last, and this will appear to 
have moved it to the top), 


and 


-dragging-, where you will have to remember where the pointer was so you can 
appropriately change the x and y parameters of the picture you are pointing at so the 
redisplay will appear to move the picture to its new x and y location. 


This will motivate writing routines for figuring out pointing actions. For example, a 
“click” will start with a “mouseDown" and be followed in some max time with a 
“mouseUp” action. If the max time is exceeded then this would be called a 
“mouseStillDown" and this might be associated with a -drag- action (as given above). 


And this will motivate writing an event handler of some kind. A simple one is a kind of 
conditional scheduler consisting of a loop that has tests for various pointing actions 
with the “then” part calling the appropriate action routine. 


And this will motivate getting around many of the things that are too primitive in C to 
be reasonably used directly. 


For example, a good ploy in lower level programming of any kind is to set up your 
own multitasking scheduler. A very simple way to do this is “loopless” programming, 
where you don't trap control into long loops, but instead thread the control through 
each iteration in all the “virtual loops” that need to be happening in each simulated 
process. 


All that I've mentioned can be done in a few hundred lines of C at the most. What's 
intricate about it is the various higher-level architectures put on top of C that you 
need to think through. 


Speaking of that, one of the biggest motivations for “real object oriented 
programming” was computer graphics, starting with Ivan Sutherland's Sketchpad 
system in 1962. So many things in interactive graphical systems are much more 
“object-like” than simple data-structure-like. 


A final thing you might try as part of your test architecture is to ponder that “things 
made from things that are made from things” appear overlapped on a 2D screen. So a 
pointing action is really a ray going down through all of the layers. This means that it 
is not clear which of the objects’ handlers should be called. 


wu 


Note, that it is often the case that it is the “bigger” “outermost” picture that will have 
the “inner” ones as parts. It's often the case that the “outer” object will need to have 
the first cut at figuring out what to do — on the other hand, many operations will 
want to be done with the “innermost” object. This suggests that some form of 
associations will need to be set up to deal with “ownership” (a two way relationship) 
and with bundling all the hits from a pointing action into a structure that can help sort 
out who will do what. 


All of the above starts to motivate moves away from simple data-structures to more 
comprehensive ways to deal with relationships. 


And this should motivate a move away from C and C-like languages, which are almost 
as far away from these ideas as machine code. 


But, for learning, it can be a good thing to deal directly with some of these 
mechanisms. 


Meanwhile, the much larger part of all of this is to learn “the art of understanding 
humans" and this leads to “the art of user interface design”. 


€ Alan Kay 
{Had something to do with "Object-Oriented Programming" - 7y 


What is a good advanced OOP course that is equivalent to CS108 at 
Stanford? 


There is much too much unsubstantiated criticism on the web. | think real criticism 
should be just the opposite: highly supported with lots of counter examples, etc. 


So, having looked at the webpage for CS 108 at Stanford — Object Oriented 
Programming % — I'll beg off from attempting a critique for lack of time and energy, 
and instead will offer an emotional reaction: | didn’t feel any warmth for the approach, 
especially with regard to any reasonable meaning for “Science” in “Computer Science”. 
Nor did | feel warmth if this were intended to be a course at any level in “Software 
Engineering”. 


What to do? I'd suggest looking at some of the most interesting examples over the 
years: from Ivan Sutherland's Sketchpad — the main inspiration for my interest in this 
— to what was accomplished at Xerox Parc — to what is being done today in ways 
that are most like some of the original ideas in OOP as | envisioned it e.g. the 
massively parallel organizations and philosophy of approach of Erlang. 


Along similar lines, | would avoid trying to get a picture of “real OOP” from looking at 
stuff done in C++, Java, etc. (it's not that you can’t set them up for “real OOP” but that 
the standard practices are not in “real OOP" style, but essentially in a parallel (I think 
massively weaker) style of Abstract Data Structures that | think has always been a real 
dead-end and distraction from what is needed). 


All that said, we need something much stronger today — qualitatively stronger — 
than even the best ideas of OOP from the past. A recent answer of mine on Quora 
touches on this. 


€ Alan Kay 
“<< Had something to do with "Object-Oriented Programming" - 7y 


What is Alan Kay's definition of Object Oriented? 

Originally Answered: What is Alan Kay's definition of Object Oriented? 

The other answers are well worth reading. An interesting question arises about 
terminology: to what extent is it reasonable to try to retain original definitions versus 
having a term get “softer” in meaning as time passes and new and additional 
conceptions are formed? (The latter happens in part because of the ways we 
spontaneously evolve our languages, so trying to legislate against it doesn't work.) 


However, | think “colonizing” a term to get reflected status is not a good practice, 
because it really weakens the central ideas (for example, C++ is called an “object- 
oriented language” — and most people argue “it is!” — but it is much too far from the 
ways | was thinking to be included in any definition | would come up with). 


Part of the problem here is that | made a mistake with how the term was coined — | 
should have picked something else — in hindsight: “server-oriented programming"? 


In any case, the “server” metaphor — mentioned by Eric des Courtis below — is good 
enough here (since I've written about and answered questions about “objects” in the 
past, and especially in the “The Early History of Smalltalk” written for the ACM ca 
1993). 


There is a bit of a red herring here because the power of a comprehensive universal 
building block can also be its downfall. For example, a “server” could choose to allow 
its encapsulation to be violated — e.g. by making its services to closely resemble data 
structures acted on by procedures. Here, in my opinion, we would be simulating quite 
the wrong kinds of things, and devolving back into weak and fragile programming 
styles. (That is my view of what has mostly happened with “objects” — “real objects” 
never showed up because most people wanted to retain their data oriented style, etc.) 


We could argue that the definition was incomplete — even: poor. It allowed too much 
discretion on the parts of programmers (this was partly because we used it for our 
own purposes at Parc and thought — not terribly accurately — that -we- had 
sufficient discretion to use it wisely (certainly not 100% of the time!) 


And ... this November will be the 50th anniversary of my “recognition” of the powers 
of the simple idea of making everything from “encapsulated servers exchanging non- 


command messages". 


Even though — in my opinion again — the simple idea of making computation 
systems “be like computers on the Internet” still isn't generally recognized, much more 
is needed in programming and systems building than “a great recognition” from 50 
years ago which had enormous relative power for about two decades. 


For example — today and tomorrow — we should be programming in terms of 
“requirements and goals” that can manifest a workable system (possibly needing a 
super-computer). 


We should be able to optimize a system like this without touching the requirements 
and goals part, etc. The feeling of such programming should be like the CAD-SIM-FAB 
cycles in more developed parts of engineering. In other words, we want to devote 
most of our attention into “the whats” rather than "the hows", use most of our energy 
for design, and we'd like to “ship the design!” (that would be a good slogan for the 
next few years). 


Just as the great language Lisp was first for programming, but then became “a very 
high level machine code" for higher level ideas, we should see that what was powerful 
about direct programming with the kinds of object systems we made at Parc and 
subsequently so many years ago, should now be retained for structural integrity and 
other pragmatic reasons, but that the code should now be automatically written from 
much higher level sources. This doesn't mean that “objects are now hidden”, but that 
they should be part of the “modeling and designing of ideas and processes” that is 
the center of what programming needs to be. 


é Alan Kay 
“7 Worked at Stanford Computer Science - 7y 
How difficult is Stanford's CS231N? 


This is not a “Computer Science question", it is a question about academics at a 
particular institution. 


€A Alan Kay 

* 1 Had something to do with "Object-Oriented Programming" : 7y 

At OOPSLA 1997 Alan Kay gave a talk titled "The computer revolution 
hasn't happened yet". What parts have materialized thus far and if not why 
not? 

The excellent answers by Mark Miller and by Tyson Edwards both get at some of the 
problems with “a real computer revolution”. 


One of the analogies, and questions, we used 50 years ago to think about this was to 
the printing press, and “when did the real printing revolution actually happen?” Back 
then, | started to argue that the real printing revolution started in the 17th century, 
and what happened in the 150-200 years in between was what McLuhan had 
identified as “the new media starting out by absorbing the forms and content of the 
old media”. In other words, the printing revolution was really about the invention of 
science and modern governance, not about getting the Bible to more people more 
cheaply. People thought it was about making old writing more “convenient”, but it 
actually toppled whole social and belief systems. 


You can see that the important part of this phrase involves trying to figure out what a 
real computer revolution might be all about — and that — if it is a -real- revolution, 
that it won't be about what first seems to be in view. 


Here is an essay | wrote about 10 years ago on this subject — it has some of the 
arguments, and some examples of what an actually “new literacy” might be like — but 
don't let it get in the way of your own thinking about this important question! 


http://www.vpri.org/pdf/m2007007a_revolution.pdf @ 


€ Alan Kay 
1 Have designed a few programming languages - 7y 


Is an iPad that boots into Squeak (or Croquet) effectively Alan Kay's 
dynabook? 

The short answer is “No”, despite the iPad having vastly more computing, storage, 
display, network, and battery resources than the “Dynabook model” thought up in 
1968 and sketched in several papers in 1972 and afterwards. http://mprove.de/diplom 
/gui/Kay72a.pdf 7 


A good historical “first pass” essay is a 1975 conference paper | wrote that showed 
many examples of the first 3 years of work on this project, and the goals that are 
expressed are in a more useful and less diffuse form than the above “vision paper”: 
http://mprove.de/diplom/gui/Kay75.pdf @ 


This was written on the “interim Dynabook" personal computer at Parc that is 
described in the paper, and was printed on the first laser printer (another one of the 
early fruits of Parc at that time). 


“No", because the Dynabook was primarily a service-idea, and the iPad does not 
deliver enough of the services envisioned 48 years ago to qualify. An answer longer 
than is suitable for Quora — but better than | can do here — can be found in an 
Afterword | wrote a few years ago for a book on children’s programming: 
http://www.vpri.org/pdf/hc_what_ls_a_dynabook.pdf 7 


I've got the latest iPad Pro here to try things. | can draw, so I'm naturally interested in 
how well one can draw on it (partly because this has something to do with children 
and others learning to draw on it). In the Notes app, | think this is “good enough”. In 
some of the other apps things are too slow, but let's not blame the iPad. 


However, where does one put the stylus? Steve sent me one of the pre-release 
versions of the iPad for comment (and you could get a capacitive stylus that would 
work with it, sort of). But where would you put it? | asked him that, and he said “no 
one will use a pencil”. | said “That's good for 2 year olds and 92 year olds, but 
everyone else uses and learns to use tools that are extensions of our gestures and 
hands”. (...) 


The Dynabook is exhibited with a keyboard because even then it was known that a 
keyboard for rapid typing would be needed (this because of experience with the first 
good tablet based system at RAND, the Engelbart system with chord keyboard, and 
some experiments with touch sensitive keyboards on displays (the possibility is 
mentioned in the first Dynabook essay)). We liked the idea of a displayable keyboard 
but worried that it wouldn't be good enough to really use for serious typing ... 


The iPad Pro does have a keyboard accessory but the way it works (not well) with the 
iPad and the stylus (not well) indicates perhaps a desire to just compare feature points 
with the MS Surface. This somehow reminds me of the difficulty of lifting out a 
MacBook Pro from a book bag — it's too slippery and it is too heavy not to have 
something to help! It seems to be been only designed for looks ... 


(On the other hand, note how long it took Apple to even create a package for the iPad 
that combined a stylus, touch sensitive display and a keyboard (which were the 
original specs for the Dynabook) ... and perhaps the Surface had more to do with this 
than actual realizations at Apple?) 


It would be disingenuous to not point out a design flaw in the 1968 Dynabook model 
here (just because we didn't think about it, and it wasn't thought up and done until 
much later). And that is it would be great to have a touchpad below the keyboard, 
despite having a touch sensitive display, and a stylus. But, after this invention in the 
80s and many years of experience with it, we can note that the iPad Pro keyboard is 
not equipped with this so useful affordance yikes! 


But the larger issues have to do with intended uses. The current question assumes that 
Apple will allow dynamic languages to be run, and (| suppose) for projects and 
interactions that involve code to be shared. Let us suppose that they make a better 
integration of the keyboard and fix the stylus and inking speed problems. Now we 
have to look at what owners of Dynabooks — especially children — should be able to 
do with them. Besides referring to the above “What is a Dynabook?” essay, I'll try to 
make a few more comments here. 


And these comments are most critical of the general attempts over the years to come 
up with a really good computer system for “children of all ages” — and especially my 
own attempts with the enormous help of many talented colleagues. We didn't get 
there (and part of this was our own fault). 


Cutting to the chase, the most “Dynabook-like” system for children was “Etoys”, which 
was done in the late 90s and released in the early 2000s. This incorporated many really 
good ideas from many sources — including the LOGO ideas of Papert, the Smalltalk 
ideas from Xerox Parc, the Hypercard ideas from Apple, some of the Engelbart 
collaboration ideas, etc. — into a system for 5th graders that allowed a number of 
breakthroughs in helping children to think more powerfully. 


http://www.vpri.org/pdf/rn2005001_learning.pdf @ 
http://www.vpri.org/pdf/rn2005002_authoring.pdf @ 


A side note here is that Etoys absolutely required “ideas from a community” (just as 
did personal computing and networking required the whole ARPA community's 
contributions). 


For example, Hypercard was more naive than we were thinking — but it was a perfect 
hit in the user space we cared about — we needed to see Hypercard to help us 
understand what we were fumbling with. 


Similarly, we needed to see Mitchel Resnick’s “Starlogo” even though we'd been 
beating the “simulation” drum for a long time. Starlogo was a really great example of 
an important part of simulation for children that helped every other more general idea 
we'd been trying to have. 


And | could go on and on here about many other truly important perspectives on 
children, powerful ideas, and computing that came from others. 


And, if we stay with the originally stated ideals of the Dynabook — and of the ARPA 
community of the 60s — then there were still quite a few important things missing in 
Etoys. One of the most important was age ranges — many ideas and facilities for 
younger and older children were left out of Etoys — and along with them scales of 
aspiration, really good integration with the Internet, and: the amount and quality of 
the packaging and support that is needed. 


Some of this was supplied by a descendant of Etoys — the very popular Scratch 
system (which was done by some of the same researchers). But Scratch unfortunately 
removed many important features of Etoys — especially in the media and system 
simulation areas — so it can't be put forward as a real improvement. 


I think the most important point — and a way of answering this important question in 
a useful way — is to consider “personal dynamic media” as a kind of “embodied 
curriculum for civilization” and to ask and answer questions along these lines. What 
should we be trying to have people learn — especially children — and what are the 
best ways of helping them? 


To me, the most interesting part of these questions is that I'm confident that both can 
be answered not just in the context of the original Dynabook idea, but in the context 

of today. There's quite a bit of work needed to catch up to 1968 — but this is doable 

now — and there's quite a bit more work needed to catch up to our world in 2016 — 
and this requires another large push like the one my research community made in the 
60s and 70s to invent personal computing and world-wide networking. 


But we can do it — and | think we must! 


» Alan Kay 
| am the Alan Kay in question. - 7y 


Is Alan Kay correct that building software is still like the design and 
construction of ancient structures like the Pyramids, in the era before 
architecture? 


The essay quoted was written in 2001. And, 15 years later, it's still a good question, 
but | think harder to answer — and especially for me: | haven't been tracking software 
development generally for a number of years (and there's been a lot more of it). 


In any case, my essay wasn't really about Egyptian pyramids, but about whether there 
could be a real “computer science” that could lead to a real “scientific software 
engineering” in similar ways to the revolutions in designing, testing, and fabricating 
physical, electrical, and biological structures over the last 70 years or so. 


If | were to write the essay again, I'd try to be more clear about the above paragraph, 
and would focus less on “late-binding” (most definitely needed), and more on 
“debuggable designing” (I think more the key to the needed processes). 


| think there are really good analogies to the current CAD->SIM->FAB revolutions 
we've seen in other areas of engineering and manufacturing. We need to design most 
clearly in terms of *meaning*, we want the system to simulate our designs to help us 
see whether we've *said what we mean*, and, though copying a software system's bits 
is easy and cheap, the main kinds of optimizations and fits to existing systems is not 
(I'll take that part of the process as what we mean by “FAB"). 


Even before we get to thinking about what the front end for this new take on software 
engineering might be like, we can see that in most existing programming languages, 
programming practices, and programs, that *meaning* has not been separated out 
from *optimization* — they are almost always commingled, even in attempts at very 
high level languages. We could imagine a language in which optimizations were “hints 
from the side”, and the programs themselves are just about meanings, but | don't 
know of one currently. 


A more often used practice is the opposite: to use rather ad hoc methods in writing 
the main program and then try to supply “meaning hints” from the side (in the form of 
assertions, unit tests, etc.). These don’t help the system make a better program, but 
just trap more errors. 


We could imagine a future system in which most of constraints expressed by the unit 
tests, etc., *are* the main program, and the pragmatics and optimizations are 
automatically created to deal with constraints on resources. 


Most of these comments apply to the impetus for “higher level languages” in the late 
50s through the 60s and 70s. Many of the inventors of high level languages back then 
wanted “more meaning and less hacking”. 


From my perspective, this impetus has not been carried through strongly enough 
since the commercialization of personal computing and networking that started in the 
80s. Most of the languages used today do not go beyond the languages of the past, 
and it is also quite startling to see so many programmers of today not really using 
"CAD" for programming, but instead there is much clinging to simulated Vi terminals, 
which are simulated punched cards, etc. (It is quite startling to see the CAD tools used 
in the other “real” engineering professions!) 


Just one more comment here ... If there was a real “computer science” we could 
imagine several ways to approach this problem. For example, it is highly likely that 
programming and debugging in terms of *meanings*, especially for systems that have 
real-time constraints, is going to require a lot of computing power to do, and 
especially to initially develop. This would suggest that such computer scientists would 
be developing their ideas on super-computers of one sort or another (at least what 
used to be called “work-stations”, if not full-fledged larger more powerful systems). In 
other words, to do CAD->SIM in ways that might not directly be connected to 
deployment, but to use the FAB part to do the work needed to deploy. 


Instead, what we see almost everywhere today are programmers programming on the 
target systems, and this has most things wrong with it — not the least that it almost 
forces premature optimization, which then puts a chill on further designing and 
reformulation of the design. 


| will submit here what | think is a much better process, that was used in the 
ARPA/PARC community in the 60s and 70s: do development on “computers of the 
future” to find out what is actually needed, and then find ways to deploy (one of these 
is that it takes a while to develop software, so you should at least use computing 
power of several years in the future). Another is to also develop hardware in various 
ways to help both development and eventual pragmatic deployment. 


It's hard to take our not-quite-a-field seriously when most of its practitioners don't 
seem to take it seriously. 


€ Alan Kay 
‘1 I'm the “Alan Kay" in question (try Google for the usual misinformation) - 7y 


Where can | find/listen to music recordings featuring Alan Kay the 
computer scientist? 

| played jazz guitar around NY and Denver from about 1957 to 1966, and wound up 
quitting the next year because | couldn't do the practicing | needed and do grad 
school at the University of Utah at the same time. (I was the kind of player that needed 
to practice a lot to keep up with players of more intrinsic ability.) | don’t know of any 
extant recordings from then (although one of the groups | played in did have some 
recording sessions in the late 50s). Many years later | started classical pipe organ (but 
not as a professional) for my own pleasure. 


€ Alan Kay 
| read a lot - 7y 


What was the first novel for Daniel Defoe? 
Originally Answered: what was the first novel for Daniel Defoe? 


| didn't know much about Daniel Defoe — except Robinson Crusoe, Moll Flanders, 
etc., and that he was very early in the English Novel and a prolific writer. 


But the Wikipedia article was very interesting — did you consult it? — it lists Robinson 
Crusoe as his first novel — and the sketch of his life got me to start looking at more 
primary materials (because I've found Wikipedia articles on things | have detailed 
knowledge about to be inaccurate). 


This prompted me to go to Amazon and order "Daniel Defoe: The Life and Strange, 
Surprising Adventures” by Richard West for 36 cents plus postage (seems like a good 
deal for anyone interested in ideas). 


€ Alan Kay 


Many years designing user interfaces, learning systems, theatrics, etc .. - 7y 


How does one explain the phenomenon of “being on autopilot"? 


Almost all of our behaviors — including much of what we like to call “thinking” — is 
done “on autopilot”. 


Perhaps too simple — but | think still useful — is what Daniel Kahneman in his book 
“Thinking Fast and Slow” calls “System 1” (all the mechanisms we have to deal with 
things quickly in real-time) and “System 2” (all the mechanisms we have to ponder 
slowly). He calls these “explanatory fictions” (we can think of them as “useful fictions’). 


System 1 is quick to react and difficult to teach (and unteach). System 2 is slow to 
react but can learn more readily. Most things we start using System 2 and eventually 
System 1 is able to take over more of the routine tasks. 


A good example is that | can tell you that I'm going to slam a door and then proceed 
to do so. Despite having warned you, you will instantly jump and start secreting 
adrenaline, dopamine, seratonin, etc. all of which help in “fight or flight” reactions. 


Similarly, no one goes on a roller coaster expecting to die, but your System 1 doesn't 
know about this benign expectation (it really can't communicate much with System 2), 
so it reacts violently. The pleasure at the end comes from the high from the "free 
dope” you've just given yourself. 


If you remember learning to drive, there was a lot of chaos, tunnel vision, difficulty of 


hearing your instructor or parent, what gear?, what stop sign, what child in the road? 
This is the general state when confronted with real-time demands for most behaviors 
without a trained System 1 to react quickly more or less reasonably, and only the slow 
System 2 tries to deal with the demands. (This is what happens with many learning 
situations for children in school.) 


Weeks later you are driving along, having a conversation with the person next to you, 
seeing stop signs, children playing etc. And you are generally mostly aware of the 
conversation and only aware of the situation outside unless something needs to be 
dealt with. 


The tradeoffs have to do with our limited capacities. Our mind is set up to routinize as 
much as possible. And most of the mechanisms in our brain are aimed at 
routinization. This makes us more efficient at the cost of flexibility. And at the cost of 
not being very aware of what and why we are doing most things. 


€ Alan Kay 
* 1 | was the main designer of the Parc GUI - 7y 


How can you add more value as a user interface or user experience designer 
in the team? 


The reason even the few good UI designs are “so-so” is that we still don't understand 
enough about human beings to do great designs, especially ones that — like musical 
instruments — can elevate human expression and thought as they are learned and 
used. 


The reason most UI designs are pretty terrible is that most UI designers actually know 
very little about humans and most aspects of design. Add to this that so many 
“computer people” are indifferent to Ul (some are actively against putting time and 
effort into it), and you have a recipe for further bad designs. 


The slippage in standards has been great (e.g. most apps on smartphones and tablets 
aren't set up for UNDO; they aren't set up to help the users learn the gestures that 
would make them powerful users, etc.) And these comments are just on UI ideas from 
50 years ago. 


So: put a lot of effort into learning what is known about human psychology and 
anthropology (including the biological aspects). Learn about human learning of 
various kinds. Look at great designers and UI thinkers within and outside of 
computing. And try to teach your colleagues on your team about UI and why it is 
really critical. 


€ Alan Kay 

7 Iread a lot - 7y 

Are there any books (apart from ‘Dealers of Lightning’ and 'Fumbling the 
Future’) about PARC's rise and fall from grace? 

“Fumbling The Future” is a poor book to read. Michael Hiltzik, who wrote “Dealers of 
Lightning", did a fair amount of research, but the book is marred by “the heroes’ 
journey” tropes and confusing timelines. 


“The Dream Machine” by Mitchell Waldrop, on the other hand, is very good, in part 
because it is the story of the larger enterprise started in the early 60s by JCR Licklider 
for ARPA, and PARC was an extension of this mostly stocked with former ARPA 
researchers. In this history you get the whole sweep of the very romantic initiative. 
There are a few quite minor errors, so this rates high in accuracy also. 


You might also be interested in a tribute | wrote to the larger community: The Power 
of the Context %, which has an extensive bibliography that includes a number of the 
personal histories of PARC researchers. 


€ Alan Kay 

<1 Had something to do with “Object-Oriented Programming" : 7y 

Has anyone criticized Alan Kay? 

As Mark Miller mentioned, Dijkstra once said that “Object-Oriented Programming was 
such a bad idea that it could only have come from California"! However, he and | were 
friendly, so | took that as a backhanded compliment. My research group at Xerox Parc 


was known as “The Lunatic Fringe", usually (| thought) affectionately. | don't count 
trolls on reddit and slashdot and elsewhere. 


There are two kinds of criticism that one can't do much about. One is represented by 
difficulty in getting *good* funding (meaning funding that is like the classic ARPA- 
IPTO funding of the 60s, and its continuation in many ways at Xerox Parc). Those 
changes represented a change in how people saw how societal and business 
investment should be done, even as the fruits of the very different ARPA way to fund 
research were generating an entirely new industry and literally trillions of dollars of 
new wealth. 


The second form of criticism is also oblique: failure of most people to put in the effort 
to more fully understand the ideas that were really good. Engelbart is the prime 


example here -- the lack of interest in CS in his best ideas has been distressing for 
years -- but this has been the case for most of the best ideas that came from Parc as 
well. 


Certainly, the greatest critic I'm aware of is myself. My temperament is basically that of 
a romantic idealist, and a "starter" rather than a finisher. I've been on myself since a 
kid for everything in myself that doesn't measure up to idealizations, and for not 
doing a good enough job getting stuff done. And still am. 


| was very lucky 50 years ago to have accidentally wandered into one of the ARPA 
projects (Utah) where they were engaged in the most romantic technological ideas | 
had seen: personal computing and world-wide networking. They also had a zeitgeist 
about how to go about doing things (that | later learned came from how the 
precursors of these efforts were learned in WWII -- especially the radar effort at 
Building 20 at MIT). | started to acclimate into this rhythm, and this required a 
combination of compartmentalizing the ideals into a room | could visit but was no 
longer my whole world, learning to work better, some therapy to deal with the 
positive and negative consequences of forming a group (which among other things 
would contain "finishers"), etc. 


| just saw a wonderful thing. Johannes Haushofer of Princeton, has just posted a CV of 
his failures: https://www.princeton.edu/%7Ejoha 
/Johannes_Haushofer_CV_of_Failures.pdf % This appeals to me very much, and | will 
try to follow suit this year. 


Finally, it's worth looking at "crazy". As Korzybski and others before him have pointed 
out, we are doomed by nature to deal with the world via mere beliefs that just can't be 
in accord with "what's out there?". We are all delusional, and by rights should consider 
ourselves and others as “crazy” -- that's a good place to start in trying to better "see" 
the world (and it's where science starts). 


However, humans generally consider that which is plentiful in their environment as 
“normal", and that "crazy" is a set of beliefs that is not plentiful. This obtains to some 
extent in science as well, because scientists are human also, and it's hard not to drift 
into believing even when one is trying to be scientific. 


Some percentage of people find "normal" less binding, and can generate "crazy" ideas 
more easily. Most ideas are mediocre down to bad (it's quite difficult to have a good 
idea, just because they are so rare and unusual). However, if | can use a baseball 
analogy, there are two kinds of "not as hoped". A good batter will still perhaps not get 
a hit more than 70% of the time, and this “not get a hit" is not considered an error, but 
intrinsic overhead when trying to do something difficult. On the other hand, not 
catching a fly ball is considered an error, because it is supposed to be within the range 
of learned technical skill. If we translate this into the world of technology, having really 
good ideas is like trying to hit the baseball, we are lucky if a small percentage are 
really good, but it is an error if we can't make a good idea into an artifact that allows 
testing out the idea (we should have learned the technique to do that). 


A good heuristic here is from Yogi Berra “You can't think and hit!". Meaning, you have 
to let the ideas come without a lot of prejudgment (swing the bat!) Then think a bit. 
This is kind of “crazy”. 


The fun thing about our century is that there is enough known about the physical and 
mental worlds to allow some vetting of crazy ideas. And lo and behold, some of these 
look as though they could actually be pulled off, perhaps with a decade or more of 
effort. After this effort is done, there is even more opposition to the validated idea (it 
is an attack on many people's sense of identity and beliefs). 30 years or more later it 
has become a new normal -- and a new barrier for making progress. 


From my standpoint, the "craziest" idea that I've had -- not an original one at all -- is 
that education and its processes can be improved enough and deployed universally 
enough to counter our dangerous genetically endowed tendencies, which might have 
been useful for survival once, but which now threaten both our species and our planet. 
There is no intrinsic reason for supposing that the little bit of progress we've made in 
these directions via learning and changing our cultures could be carried far enough to 
not just stave off disaster, but to create a new conception of humanity. 


Still, I'm most definitely crazy enough to keep working on this. 


€ Alan Kay 
k 1 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 7y 


As a computer science major, | want to develop my mathematical problem 
solving skills. So, what books are good for developing problem solving 
skills? 


| like both the previous answers, especially the “until your fingers bleed" one! | should 
add the classic by Polya "How To Solve It" to the reading list. For a more contemporary 
look | suggest tuning into the Abel, Fields and Turing Laureates Meet the Next 
Generation » Heidelberg Laureate Forum &@ in September where Fields, Abel and 
Turing Award winners winners gather each year to meet with students and talk about 
issues: | noticed this year that one of the sessions will be on the expanded range of 


“what constitutes a proof" these days. 


Back to “bleeding fingers" and a musical analogy. It helps a bit to read books, etc., 
about the musical instrument and music you are trying to learn to play, but the main 
problems have to do with teaching your interior "System 1" and “System 2"!"! that the 
instrument is an extension of your body, mind, will and art. 


A big part of this is “training intuition", particularly for understanding and for 
improvisation. When you start off in any of these areas it is like what learning to drive 
a car was like -- things are happening too fast, tunnel vision, you don't know what is 
going on, etc., -- because you need "System 1" to have set up hundreds of things that 
are automatically paid attention to, and this hasn't happened yet. 


Throw in "solving" for “composition”, "improvisation", "proving", "inventing", etc. and 
it's initially like being in an ATV in the dark driving across an unknown landscape with 
a1 watt bulb. You can't "see" where it would be good to go! Many “bleeding fingers" 
later your headlights illuminate more and more and allow you to pick fruitful paths 
through the obstacles and distractions. 


When | was a math major | asked one of the grad students how to do it, and he said 
“Well, you have to guess the answer, and that will tell you how to do the math"! Some 
“bleeding fingers" later, | was starting to make better guesses. 


(A good heuristic here is to find a way to avoid being distracted by your guesses, even 
when they get better -- System 1 is quite a terrible thinker, and most ideas are 
mediocre down to bad!) | like to write down guesses in a notebook and then not look 
at them. This allows much more of the terrain to be seen. 


The other heuristic is a more drastic one, and more useful outside school: "the problem 
is not the problem" -- meaning: it is worth finding the real problem before trying to 
solve the apparent problem (which is usually tied to a weak and obsolete context). | 
used to get in trouble in school for ignoring the problems | was asked to solve and 
going after ones that seemed “better” (but this propensity helped greatly after | got 
out of undergraduate levels and started working on real problems). 


And, if you are a computer science major, then it is important for you to understand 
that “mathematics” is a plural for a reason (namely, so we can invent a math when we 
need to make progress). Computing constitutes a new area of relationships which 
allows mathematical possibilities, but some of the important properties require new 
ways to think about them (some of how computing has gone astray has come from 
trying to apply too much old-style thinking to what is new and different). 


Finally, when a mathematician comes in strong contact with real engineering there is a 
moment of shock when both the nature of proof and much of the method can be 
seen as "good engineering" (in this case of *relationships* -- or as von Neumann liked 
to say “mathematics is relationships about relationships"). Engineers are often a bit 
cavalier about math, but they have the same shock when they really get into it, and 
often find that some of the heuristics that work for math also apply in physical world 
thinking. 


The current day Venn diagram of these processes has (in historical order) Tinkering, 
Engineering, Mathematics, and Science -- most of the expert practitioners find the 
sweet spot in the intersection and use aspects of all to make progress. | advise 
“bleeding your fingers" to become an expert in all four. 


Footnotes 


[1] Thinking, Fast and Slow @ 


é Alan Kay 
A Have designed a few programming languages : 7y 


Can we have garbage collection and memory management on a fully- 
compiled language? That is, no VM or interpreter required? 
Sure! 


As others have also answered "yes", | thought I'd just add a couple of side notes. 


First, there are many techniques for storage reclamation, and it's worth looking at 
them. We used reference counting at Parc in the 70s because we had real-time music 
and animation concerns on relatively small machines. However, when we did Squeak 
20 years later, there were many more GC techniques to draw on that could handle 
real-time very well, and provide other services not easily done with reference 
counting. (We used a "generational GC with ‘tenuring' " that constantly collected a 
little, and always provided “"head-room" that allowed massively parallel real-time 
processing -- part of the idea was due to Dave Ungar.) 


For speed freaks (actually so low on the priority list these (and most) days), but still, | 
thought it would be fun to mention a kind of "ultimate" storage management that 
Ned Irons (Edgar T. Irons) used for the first really practical extensible language IMP 
(CACM Jan 1970). This was done on a supercomputer of the day (CDC 6600) and what 
he wanted was *no* overhead on dynamic arrays compared to statically allocated 


arrays. He accomplished this by extending each array with space to hold back pointers 
to the loaded code that used the array address. The basic idea here is that the arrays 
would generally be much larger than the number of back pointers needed. Then when 
an array was extended/moved, he could then use the backpointers to fix-up the 
addresses in the code itself. Eeecks, but it worked quite well! It was essentially making 
“the linking-loader" dynamically active, etc. 


€A Alan Kay 
‘1 Still trying to learn how to think better - 7y 


How does imagery affect tone in literature? 


This sounds like the "criticism" classes | was able to avoid in both English literature 
and in music. More worthwhile using the time to do more reading and playing! (Since 
most ideas -- and books -- are mediocre down to bad, it's a good idea to encounter 
lots of them to better the chances of "cosmic collisions".) 


But you can also answer this question quite easily yourself just by looking at the 
official definitions of the terms. 


“Tone” is basically attitudinal: formal, carefree, humorous, sarcastic, etc. 
“Imagery” applies to sense-like descriptions: scenes, smells, sounds, etc. 


Pick an attitude and then pick sensory examples that help embody it or are opposed. 
Pick a sensory example and then pick attitudes that go along or the opposite. 


Good writers use lots of devices, but lots of devices don't make something worth 
reading. This is similar to the problem of sculpture: clay is completely malleable, but it 
can't simply be debugged into art. 


Vision is needed. 


Much better to start with searching out important ideas that are worth explaining to 
others, and then to find as many ways as possible to help them understand the 
importance. 


€ Alan Kay 
‘Had something to do with "Object-Oriented Programming" - Updated 6y 


| don't understand properly Alan Kay's quote about his math background. 
Can anyone explain it in layman terms? 


Sometimes things can be nicely simplified yet made more powerful at the same time. 
This is often “what ‘math’ is good for". 


When | looked at how Ivan Sutherland had organized the objects in Sketchpad -- 
(1962) the first real interactive graphics system (and much much more) -- one of the 
things he did was to have each object have a "place" called “display” and all you had 
to do to get any kind of object to display itself was to call the subroutine at "display" 
(which now became a generic label for the “idea of display"). 


This removed the need of the invoker to know how to display or to need to know the 
particularly name for its display routine. Each object was its own name-space, and Ivan 
realized (with the help of some ideas from Doug Ross and others) that "separating 
concerns" like this really simplified design and programming. 


The idea of "least knowledge" is a very powerful way to think about making modules 
that can cooperate. 


This is what “algebras” in math do. They started out as generalizations of things and 
operations like + and * and identities, etc. that could be used over a wide variety of 
things. This often works nicely because there are often many more kinds of things 
than there are powerful operations. And there are often many more ways to do things 
than good things to do (for example "sort" is a powerful idea, and there can be many 
dozens of different ways to sort, but the end result is the same. The differences are 
“pragmatic” rather than "semantic’). 


My inner hit was that “wow, we can make a really powerful but simple as possible 
interface language between objects, and we can hide the pragmatics inside of objects, 
etc." 
About - Careers - Privacy - Terms - Contact - Languages - Your Ad Choices - Press - © Quora, Inc. 2023 
Today this property of modules/objects is usually called "polymorphism" (a not quite 


accurate term, but | think Peter Wegner tagged it, and it has stuck). 


Alan Kay 
Still trying to learn how to think better - 7y 


What are proxemics in drama? How are they implemented? 


| don't think | know anything special about this subject, beyond just the rules of 
thumb that are used on the stage to relate actors to each other spatially, and 
sometimes to “special objects". 


For a given play and given actors, the same director can be very influenced by the size 


and kind of stage (proscenium, thrust, in the round, etc.) and the amount of rake and 
upstage available. And staging will normally be quite different for both movies and TV, 
where the visual angles are different, and cutting and closeups, etc., can be quite 
different because of the different visual angles available. 


Staying with the theater, actors "need stage", so many scenes of close relations are 
sometimes staged with the actors further apart than they would be in real life, but 
with the use of voice, posing and gestures to create intimacy. As an aside, "posing" is 
a critical element of theater, where the distance from the audience and the desired 
projection of “generalizations beyond the characters" require much slower movement 
than real life, and something more like mime and tableau, to help the audience 
understand what is going on and being felt. | mentioned the important of "masks", in 
a previous answer, as another critical element to get beyond "people walking and 
talking". 


A larger set of things to look into is the role of caricature in all arts. “What's there" is 
put there to help the audience make something special inside their heads, so much of 
the “art of Art" is choosing "what" to put "where". 


é Alan Kay 
1 Still trying to learn how to think better - 7y 


What is the primary reason Willy Loman kills himself in Death of a 
Salesman? 

Originally Answered: What is the primary reason Willy Loman kills himself inDeath of a Salesman? 
One way to approach this is to look up the definition(s) of classical "tragedy" and see 
what a playwright might try to do today in a more inclusive society in which the larger 
bulk of humanity is attempted to be treated as important. 


Then, you can ask the first question here -- and get a handle on the answer: what was 
Miller trying to accomplish in this play? 


The next question might be: how successful was he for you? 


My opinion doesn't matter. This is about you and your question. But just to slide my 
perspective into the conversation: | think Miller was more successful in raising the idea 
-- that any human life can be tragic -- than he was in making this particular play work. 


To me, a much more interesting book (not a play) is "Man's Search For Meaning" by 
Viktor Frankl. | think it gets to more of the important matter than Miller does. 


€ Alan Kay 
“J Still trying to learn how to think better - 7y 


What were Sophocles’ contributions to theatrical drama? 
| thought the wikipedia article on Sophocles was pretty good -- | suggest perusing it. 


The corresponding wikipedia article on Greek theater was less comprehensive -- and 
then there is the question of "theater" itself, and the means and purpose of the 
tragedy. A good book to look at is “The Magic Mirror" by George Jean Nathan. The 
title is one of the ways theater people think of what the theater is about: its purpose is 
to "beam the audience back out at themselves". Coleridge, who was a theatrical critic 
as well as a poet, once said “People go to bad theater hoping to forget, but they go to 
good theater tingling to remember". As with many arts, the aim is not so much to 
teach as to evoke. 


Because of the strange -- and very "theatrical" -- ways our minds work, we are able to 
be drawn into a kind of waking dream that is real enough for tears, anger, excitement, 
and most other human emotions. Part of the "trick" is to find ways to get the audience 
out of "seeing people walking and talking" and instead have the audience enter into 
the world of the characters, where there are no actors, just the special reality we can 
make with our minds. One of the ways the Greeks accomplished this was through 
masks (it is worth using the Internet to find out about masks and how they and 
makeup and costumes have been used for thousands of years to help imaginations 
see what "needs to be seen’). It's important to know that Greek audiences could be 
frightened by some of the masks, and to realize what this means. 


The closest normal stage process to Greek Theater today is the oratorio (e.g. Handel's 
Messiah). There are a few "soloists" -- usually no more than three (and Sophocles was 
credited for adding the third one) -- and a chorus. The level of actual singing is not 
completely known. (Both the "opera" and the “oratorio” were late 16th century 
inventions to try to revive something like Greek theater.) 


My most direct personal experience with Sophocles was in the staging of Oedipus at 
Colonus. This was done as much like it is thought the Greeks might have done it. 
Besides the oratorio-like staging and delivery, the thing that worked to get all of us 
into it was to attend a number of Catholic Masses done old style in Latin with all the 
traditional touches. Why? Because Greek theater came out of the religious "mystery 
plays" that often had strong touches of the transcendental, even to the enhabitation 
by a god. 


When everything is "working", the stylized declamations and almost no physical 
movement, coupled with the masks and music, create something like an other worldly 
mythic transcendent experience, more associated with religion and with art music 
today than with most theater. It is the opposite of “naturalistic”. 


In the production | helped with, we were lucky to have an actor with a truly enormous 
voice, and for him, all the stylizations fit him perfectly. His vocal impact turned out to 
be critical to help the audience get used to “tableaus" and “posing”, etc. 


| have since seen a number of Shakespearean productions -- mostly in the UK -- 
staged very similarly to what little we know about Greek theater practice. | found them 
especially effective in getting the audience to really listen to the words *as the action*. 


€ Alan Kay 

* 1 Had something to do with "Object-Oriented Programming" - 7y 

Is a microservices architecture with RESTful APIs an implementation of Alan 
Kay's concept of object-oriented programming? 

| was asked to write a history of Smalltalk for the ACM History of Programming 
Languages II conference in the early 90s -- an online version is: The Early History Of 
Smalltalk @. This will make more clear some of the influences. 


For our purposes here, while a grad student at an ARPA project in 1966, besides the 
catalyzing influences of Sketchpad, Simula, and Biology, | was interested in (a) the 
notion of “processes” in operating systems (they were sometimes called “virtual 
machines") which were rather similar to what are called "processes" or "tasks" today: 
i.e. encapsulated via MMUs, protected entities that acted like the entire hardware, and 
(b) by the starting to be talked about ARPAnet project, which proposed to use packet 
multiplexing. 


The simple insight | had (just about 50 years ago now) was that you only needed 
“computers all the way down" -- because “a computer can do anything a computer 
can do". 


| doubt that | was the first to have this insight (it's so simple -- almost tautological) but 
if not the first, then | think | might have been the most cosmically taken by the idea. It 
was so powerful in so many areas, especially because | was not of the psychological 
type that delights in “cleverly coping”. It was kind of a "math insight" (one of my 
majors in college was pure math, specializing in algebras). Basically, semantics would 
be in terms of behaviors, and pragmatics could be quite orthogonal (and 
interpluggable). 


This followed very strongly the simulation perspectives of Sketchpad and Simula -- if 
your basic element is "a computer" you can make it simulate anything, including basic 
elements. This fit strongly into another zeitgeist of the times: extensible “problem 
oriented" languages. 


Combined with the “protected process" ideas, it seemed pretty clear that you didn't 
want a sender to be able to control a receiver. So you would start out with “requests” 
rather than "commands". However, because you could simulate anything, you could 
choose to have receivers interpret some requests as commands (and thus you could 
simulate old-style programming with data structures and assignments). But this was 
clearly a very bad idea because it violated encapsulation of a process. And this 
violated modularity -- whose whole point is to isolate and concentrate knowledge in a 
complex system. 


A “request” oriented system is one made up of servers with services, and the servers 
can choose to serve or not. (Take a look at CAL-TSS ca 1969 by Butler Lampson for an 
example of a “capability OS" design.) Also note that there were several other OSs 
around or just happening (like Unix which was kind of going in 1971 and announced 
in 1973) that were intended to have lots of processes that could intercommunicate via 
messages that could be interpreted by the receivers. 


Carl Hewitt's PLANNER was a set of ideas for "goal-oriented programming", and 
squinting the right way, one could see that this was a powerful way to think about 
“requests” (it was a real extension of the powers and problems one got with 
“interprocess communications"). 


| got interested in what all this meant for "language", and realized that in this kind of 
scheme, the syntax was merely a set of forms on the requests, and that the semantics 
would be supplied by the receivers. Most computer people are much too ad hoc 
(turns out we were also even though we were trying to be good), so this wonderful 
universe of degrees of freedoms needs to be given some unifying principles. This was 
again very much like coming up with “algebras” which unified a small number of ideas 
over many different specifics that were nonetheless similar in important ways. 


| was also interested in the problems of definition, and that you always have the 
problems of “in terms of what?", and that "computers all the way down" was always a 
very good ultimate answer. 


What remained was a lot of design probes and serious software engineering, to come 
up with something that was quite simple and completely comprehensive at the same 


time as being fast enough to do all the real-time user interface and other computing 
we wanted to do. (This had very little in common with later things called “object 
oriented programming" -- why didn't they just give it a different name? Guess ...) 


For those who are interested, we resurrected one of these early systems from the 70s 
at Xerox Parc, literally from a disk pack that Xerox had thrown away. | used it for a 
tribute to Ted Nelson, and did all the visual materials for the talk with it. Alan Kay's 
tribute to Ted Nelson at “Intertwingled" Fest @ 


Our view of all this stuff after the fact is that "real objects are a really good idea" (and 
that any kind of meaning on a computer has to be done via some form of 
interpretation, and this requires some form of *process* idea). We think that much 
better detailed designs than ours were are possible now. And we think that much of 
what exists now is much too messy and ad hoc to scale reasonably. |.e, we all need to 
do better! 


For example, | really wanted to deal with the "coordinated state" problem by using 
some form of McCarthy's “fluents" and "system level transactions". We and others at 
Xerox Parc looked at parts of this but did not get them into our practical systems. This 
is still an important issue, and there is now known a lot more about how to do this at 
different scales: http://www.vpri.org/pdf/tr2011001_final_worlds.pdf @ 


Now, to try to answer the question. First, it's always worth letting people make their 
own comparisons. What do you think? 


Second, | would say that REST found some of the good principles of making systems, 
and missed some of the others. This is too bad, since it came later, and could have 
been much more informed by earlier good attempts and design and systems building. 
This is true of the web in general (which quite missed e.g. a number of Engelbart's 
much more powerful ideas, and many others as well). 


Best wishes! 


€ Alan Kay 
k 7 I'm the "Alan Kay" in question (try Google for the usual misinformation) - 7y 


Will the code on p.13 of the LISP 1.5 Programmers Manual by John 
McCarthy and friends ever be taught in a physics class? 


How about Biology in a Computer Science class? | think a way to make progress here 
is to try to appreciate what of the past can actually help thinking about the future. 


For example, there is a lot to inspire in Lisp and in McCarthy's ideas -- and perhaps in 
Smalltalk also -- but there's no question that much more is needed (and that much 
more can be accomplished). This means that using these old "very interesting" 
systems as metaphors can help, but that being too slavish about the details will 
mislead. 


Today we are in a situation where the most interesting computer artifact is the 
Internet, and much of its interest comes from how well it has scaled -- and much of 
the reason that it has scaled so well is that some of the principles of Biology were 
used in the design. Here we had to create "metaphors from the other direction" 
because the scaling of biology -- each of our bodies have trillions of cells, and each 
cell has billions of informationally interacting components -- is still immense 
compared to the piddling little computer systems we can make. 


A worthwhile computer language and its artifacts to look at is Erlang -- which has 
much in common with the first Smalltalks at Xerox Parc, and has very little in common 
with the misnamed “object oriented" languages generally in use today. And there is a 
new even more interesting layer done for Erlang called Pony, which has more 
biological characteristics. 


One thing to note is that as systems scale up it becomes harder and harder to 
centrally control them and harder and harder for the parts to interact in "gear-like" 
ways. A side-note is that it gets harder and harder to deal with various kinds of errors 
as well. Biology in contrast has “loose-coupling" at most levels of scale, and it also is 
set up so that many errors can be handled just by zapping (even if some of the things 
that get zapped are OK). Most computer systems by contrast are early-bound and 
tight-bound, and quite fragile to even simple errors and most kinds of zapping. 


Psychologically this is a problem because a lot of learning computing is unfortunately 
learning to cope with what's not so good, and this quickly becomes a way of life, and 
then a “reality” that is hard to see past. 


€ Alan Kay 

<7 Invented and designed a few user interfaces ... - 7y 

Who was specifically responsible for the invention of the "windows" GUI 
paradigm at Xerox PARC? 

Mr Rao has a good slant on this, in particular to give the overall credit to the larger 


research community -- that of ARPA-IPTO. Many of us at Parc were graduate students 
under ARPA-IPTO and we considered Parc “another ARPA project" (just supported by 


Xerox because of the vanishing ARPA funding caused by the Mansfield Amendment). 


By the time Parc came about (in late 1970), there were already GUIs within ARPA and 
elsewhere, and many of the elements we are familiar with today -- pointing devices, 
clipping windows and panes, buttons, some iconic, etc. -- were already in use in 
isolated systems. The overlapping window Parc GUI can take credit for being a 
comprehensive and general design, based on a large number of ideas, both theatrical 
and psychological. Most of the ideas in it were supplied by several of us in the 
Learning Research Group, including myself, Dan Ingalls, David Canfield Smith, Adele 
Goldberg, Larry Tesler, Ted Kaehler, etc. 


€ Alan Kay 
<1 Lam the Alan Kay in question. - 7y 


Why isn't Alan Kay's FoNC (Fundamentals of New Computing) project more 
discussed or replicated? 


| can't answer the question directly, but | can say a few words about the difficulties. 
The project came from long held feelings by many of us that there was much too 
much code for the actual meanings in many systems (by factors of hundreds and 
sometimes thousands). 


We had a Smalltalk system (Squeak) that could do a lot in about 200,000 lines of code, 
and we thought that about 100,000 was not needed to sustain both a base (like an 
OS), an IDE, and an object system and interface that was like "apps" but much more 
integrated. Since the meta ideas in Smalltalk dated back to the late sixties and early 
70s, and there had been some progress in meta-definition since then, we arbitrarily 
thought it would be fun to try for another factor of 5 for “vanilla personal computing": 
namely, something that could run on bare hardware that could sustain what most 
people did with personal computers (kind of the functionality of MS Office). 


We got quite a bit done over the 5 years of the project and the yearly reports can be 
found on the Viewpoints website (Viewpoints Research Institute @). Through some 
personnel and other issues, we didn't get the very bottom of the system to quite 
support the rest with a total of under 20,000 lines of code, so we didn't make our goal, 
and then the funding ran out. (A lot did get done in under 20,000.) However, we still 
have an itch to take another look, and there have been a few things invented since 
then that will help, especially at the lower levels. 


It's worth pondering the difficulty of pinning down the goals here. In particular, it is 
difficult to express the metes and bounds of what parts of personal computing are 
being attempted and what parts of “operating systems" are trying to be subsumed. 
On the other hand, the goal is still a good one | think. 20,000 lines at 50 lines a page is 
about a 400 page book, and such a book, if readable, would be a great advance in 
expressibility over the typical cryptic forms that are hundreds and even thousands of 
times larger. But the previous sentence is merely a claim that needs to be carried out 
to have real force. 


